Image: Your Burner Account Is Not Safe. LLMs Can Find You.
The assumption has been baked into internet culture for decades: use a different username, post from a different account, and you're protected. Not anonymous - everyone understood that a determined state actor or a law firm with resources could eventually find you. But practically safe. Safe enough for the whistleblower, the dissident, the abuse survivor running a parallel life, the activist who needs to organize without their employer knowing.
That assumption is now dead. Researchers have demonstrated that large language models - the same technology behind ChatGPT and Claude - can strip pseudonymity from social media users at industrial scale, with a precision rate up to 90 percent. And it doesn't require a nation-state's resources. It requires an API key.
The paper, "Large-scale online deanonymization with LLMs" (arXiv:2602.16800), was published in late February 2026 by researchers including Daniel Paleka and Simon Lermen. The core system works in three steps: extract identity-relevant signals from raw text, use semantic embeddings to find candidate matches, then use an LLM to reason over those candidates and verify the match while filtering false positives.
This is the part that separates it from everything that came before. Classical deanonymization - the kind described in the famous 2008 "Netflix prize attack" paper - required structured data. Two databases with similar schemas that could be algorithmically linked. The LLM approach requires no such structure. It works directly on free text: posts, comments, interview transcripts, anything a person has written.
"Starting from free text - like an anonymized interview transcript - they can work their way to the full identity of a person. This is a pretty new capability. Previous approaches generally required structured data." - Simon Lermen, co-author
Three datasets. Three scenarios. Each one worse than the last.
The first experiment linked Hacker News accounts to LinkedIn profiles using cross-platform references found in the profiles themselves. The LLM agent, given internet access, performed this matching at a level that would take a skilled human investigator hours per subject - automated, at scale, overnight.
The second used Reddit movie discussion communities - five subreddits including r/horror, r/Letterboxd, and r/TrueFilm. The more movies a user had discussed, the easier they were to identify. With ten or more shared movie references in their history, 48.1 percent of users could be identified at 90 percent precision. That is not fringe accuracy. That is a usable surveillance tool.
The third, and perhaps the most unnerving: the researchers took a set of 125 people who had participated in Anthropic's own research interviews about how they use AI in daily life. From the anonymized transcripts alone - responses to fairly generic questions - the system correctly identified 7 percent of participants.
"The fact that AI can do this at all is a noteworthy result. And as AI systems get better, they will likely get better at finding more and more identities." - Simon Lermen
Seven percent sounds low. But consider: these were subjects who gave one interview, answering vague questions, with no cross-platform writing history to draw on. And even then, one in fourteen were identified. From a single conversation.
The obvious targets are activists, whistleblowers, and people in countries where the wrong opinion can land you in prison. But the attack surface is far broader.
Domestic violence survivors who maintain separate online identities to avoid abusers. Employees who moonlight in fields their employers would punish them for. People with stigmatized health conditions who discuss them in forums under pseudonyms. Mental health communities on Reddit. Anonymous political commentary. The whole architecture of "I can say this here because they don't know it's me" collapses under this framework.
The researchers note that the technique scales. You don't run it on one target manually. You run it on hundreds of thousands simultaneously. That's the second-order effect worth sitting with: this isn't a tool for finding a specific person. It's a tool for building a database. Every pseudonymous user, cross-referenced against every public dataset, producing a confidence score. Commercial data brokers would pay for exactly this.
The paper proposes some fixes. Rate limits on API access to user data. Detection of automated scraping. Restrictions on bulk data exports. LLM providers monitoring for misuse patterns in deanonymization-style queries.
These are reasonable recommendations with a fundamental structural problem: the data is already public. Millions of posts, years of writing, are already indexed and available. Rate limits slow the pipeline - they don't stop it. Scraping detection is an arms race that scrapers have historically won. And "LLM providers monitoring for misuse" requires LLM providers to make a choice about what constitutes misuse and enforce it consistently, which is not their current posture.
The more durable fix is behavioral: assume every pseudonymous account can be linked to your real identity. Write accordingly. Don't mix the topics you discuss in a way that creates a fingerprint. Don't reference personal details - even oblique ones like your job sector or city size - across platforms. Treat every post as potentially attributable.
That is a significant change to how people use the internet. Most people will not make it.
This research arrives in a specific political moment. Surveillance capabilities that would previously require state-level resources are becoming commodity tools. The practical protection that pseudonymity offered was never legal - it was computational. The math has changed.
What the researchers have essentially shown is that the implicit social contract of pseudonymous internet participation - "you can be found if someone really wants to find you, but casual pseudonymity is safe" - is no longer accurate. The "really wants to find you" threshold has dropped to the price of a cloud compute budget and an LLM API subscription.
For platforms, this creates a liability question they have not yet confronted publicly: if your infrastructure enables bulk collection of user data that can be fed into these pipelines, are you complicit in what happens next? If a dissident is identified through their Reddit posts and arrested, what did Reddit's data export policy enable?
These are not hypothetical questions. They are questions that will be answered in the next two to three years, in courtrooms and in press releases.
Get BLACKWIRE reports first.
Breaking news, investigations, and analysis - straight to your phone.
Join @blackwirenews on Telegram