Tech & AI Bureau

A Chatbot Decided Your Research Grant: How DOGE Used ChatGPT to Gut the National Endowment for the Humanities

One prompt. 120 characters. Millions of dollars in cancelled research funding. DOGE didn't read the grants. It didn't hire experts. It fed internet summaries to ChatGPT and let an LLM make the call. The results were "sweeping, and sometimes bizarre."

PRISM - Tech & Science Bureau | March 9, 2026 | 12:45 CET

Empty government corridor symbolizing bureaucratic cuts

The NEH has funded American humanities research since 1965. A 120-character ChatGPT prompt may have ended much of it. (Unsplash)

The prompt was not complex. According to reporting by the New York Times, employees from the Department of Government Efficiency - DOGE - arrived at the National Endowment for the Humanities with a specific mandate: cancel every grant touching diversity, equity, or inclusion. What they didn't bring was the patience to read grant applications.

So they used ChatGPT. The prompt, according to the Times: "Does the following relate at all to D.E.I.? Respond factually in less than 120 characters. Begin with 'Yes' or 'No.'"

The grants they fed into this prompt were not the actual applications - those run to dozens of pages with supporting documents, methodologies, and peer review. What DOGE pulled were short summaries scraped off the internet. The AI read the summary. It answered yes or no. The grant lived or died.

The results, the Times noted, were "sweeping, and sometimes bizarre." Researchers whose work touched on topics that AI associated with DEI - even tangentially, even historically - found themselves cut. Projects on civil rights history. Oral traditions of Indigenous communities. Analysis of how certain demographics have accessed education over centuries. ChatGPT saw "diversity-related" content and flagged it. DOGE cancelled it.

This is what AI-powered bureaucracy looks like at scale. Not a dystopian warning. Not a hypothetical. Something that happened last week, to real researchers, with real consequences.

What the National Endowment for the Humanities Actually Is

The NEH funds projects that don't generate commercial returns - oral histories, archival preservation, translation of ancient texts. (Unsplash)

The National Endowment for the Humanities was created by Congress in 1965, signed into law by Lyndon B. Johnson alongside the National Endowment for the Arts. For six decades it has funded the kind of research that markets don't. Not because it isn't valuable - because its value is long-term, diffuse, and doesn't show up on a quarterly earnings report.

NEH grants fund things like: archival digitization of newspapers from the 1800s, oral history projects capturing the testimonies of people who lived through major historical events, translation of medieval Arabic manuscripts, studies of how languages evolve, documentation of folk music traditions before the last practitioners die. The grants are typically in the $50,000 to $500,000 range. The projects they fund often produce work that forms the foundation of other research for generations.

In the current fiscal year, the NEH budget is approximately $207 million - a rounding error in the federal budget but significant for American cultural and intellectual infrastructure. The grants it awards go through a rigorous peer review process. Panels of subject matter experts evaluate applications on criteria including intellectual merit, methodology, and expected contribution to the field.

That process took years to build. DOGE bypassed it in days.

$207M

NEH Annual Budget

120

Character Prompt Limit

Years of NEH Existence

Subject Experts Consulted by DOGE

How the Cancellations Actually Happened

The mechanics of what DOGE did at the NEH reveal something important about how AI gets used when speed is the priority. This wasn't a carefully designed automated system with guardrails and human review. It was improvised. Someone with access to ChatGPT and a mandate to cut DEI-adjacent spending found a way to process a large list quickly.

Step one: pull grant summaries. Not the full applications - the public-facing summaries available on the NEH website and in grant databases. These are short, often a paragraph or two, written to be accessible to general audiences rather than to accurately convey the full technical scope of the research.

Step two: feed each summary to ChatGPT with the binary prompt. The model's instruction was not to weigh the academic merit of the work, or to determine whether it technically violated any executive order on DEI. It was simply to say whether the summary "relates at all" to DEI. The threshold was "at all." Not "is primarily about." Not "is designed to advance DEI programs." Any connection, however peripheral, would trigger a yes.

Step three: cancel the yeses. The human layer in this process was not review - it was execution. Someone took ChatGPT's yes list and sent termination notices.

"Instead of looking closely at funded projects, they pulled short summaries off the internet and fed them into the A.I. chatbot."

- New York Times, March 7, 2026

The problem is that virtually any humanities project can be described in ways that trigger a broad DEI association. A project studying the history of labor unions involves workers of various backgrounds. A project documenting medieval Islamic mathematics involves non-European knowledge traditions. A project on the evolution of English dialects in Appalachian communities involves regional and cultural identity. Feed any of these summaries to a model trained to associate "diversity" broadly and you get a yes.

Researchers and academics watching the cancellations emerge noted projects that had been through rigorous peer review and had no DEI component in any meaningful sense - except that the work involved human beings who weren't all white and male. ChatGPT made no distinctions. It output yes or no. It doesn't understand the difference between "a project that promotes DEI programs" and "a project that studies history involving diverse people."

The Hallucination Problem Nobody Is Talking About

The coverage of this story has largely focused on the political angle - DOGE cutting grants, Trump's anti-DEI agenda, the attack on humanities funding. Those are legitimate frames. But they're missing the more technically alarming part of what happened.

LLMs hallucinate. This is not a bug that is being fixed - it is a fundamental characteristic of how these systems work. They generate statistically likely completions based on training data. When they don't have sufficient information, they confabulate. When the input is ambiguous, they guess.

In the DOGE/NEH case, the inputs were short summaries scraped from the web - often incomplete, sometimes outdated, and not necessarily accurate representations of the work being funded. The model had no access to the actual grant applications. It had no way to verify anything. It was operating on compressed, potentially lossy information with a binary output requirement.

What is the error rate in this kind of classification task? We don't know. Nobody apparently checked. The New York Times noted the results were "sometimes bizarre," which implies anomalies were visible even to the humans running the process. But bizarre outputs don't get flagged and reviewed in a process designed for speed. They get filed with all the other cancellations.

    How LLMs Fail at Binary Classification
    Context collapse: Short summaries strip context. A project on "diverse language acquisition" could mean bilingual children, ancient scripts, or regional dialects. The model can't tell.
Semantic drift: The word "diversity" appears in biology, genetics, ecology, economics. LLMs trained on broad corpora don't always distinguish political "DEI diversity" from scientific "biological diversity."
No confidence scoring: The prompt demanded a binary yes/no. There was no way for the model to express uncertainty. A 51% confident "yes" looked identical to a 99% confident one.
Recency bias in training: Models trained on internet data from 2020-2024 have absorbed enormous amounts of polarized discourse around DEI. This can distort classification toward over-triggering.
No source access: The model couldn't read the actual grants. It worked from summaries of summaries in some cases.

  

The practical result is that some unknown number of the cancelled grants were probably cancelled based on hallucinated or distorted output. Researchers whose work had nothing to do with DEI programming found funding terminated because an LLM misread a 200-word blurb. There is no appeals process that was designed with this scenario in mind. There is no mechanism to distinguish "cancelled because of substantive DEI concerns after expert review" from "cancelled because ChatGPT said yes to a summary."

The Accountability Vacuum

Government building with abstract data overlay

When an algorithm makes the decision, who is accountable? The DOGE process left no clear answer. (Unsplash)

One of the foundational principles of administrative law is that government decisions affecting individuals must be made by humans who can be held accountable and who can explain their reasoning. The Administrative Procedure Act, passed in 1946, established requirements for notice, comment, and reasoned decision-making precisely because unchecked bureaucratic power produces arbitrary outcomes.

The DOGE/NEH process appears to have violated the spirit of this framework, possibly the letter. Grant recipients had their funding cancelled without a human expert evaluating their specific project. The "reasoning" behind the cancellation was a binary output from a commercial AI system operating on incomplete data. That output cannot be interrogated, cross-examined, or appealed in any meaningful sense - because ChatGPT does not explain its reasoning in a form that allows for procedural challenge.

Legal challenges are already forming. Multiple organizations representing affected researchers have indicated they intend to contest the cancellations in federal court, arguing the process was arbitrary, capricious, and violated due process. The legal question is novel: does the use of an LLM to make administrative decisions constitute a "reasoned explanation" under APA standards? Courts have never had to answer this question because no administration has previously outsourced consequential bureaucratic decisions to a commercial chatbot.

The government's probable defense is that the human DOGE employees made the final decisions - that they were using ChatGPT as a tool, not delegating decision-making to it. This argument would be stronger if there were evidence that any human reviewed the AI's outputs critically, or that the process included any mechanism for filtering false positives. There doesn't appear to be.

"The prompt was simple: 'Does the following relate at all to D.E.I.? Respond factually in less than 120 characters. Begin with Yes or No.' The results were sweeping, and sometimes bizarre."

- New York Times, March 7, 2026

This is the accountability vacuum that AI-assisted government creates. Decisions happen faster than review processes can follow. The human nominally making the decision is actually just executing an AI's output. The AI itself is a black box operating on inputs it didn't gather and can't verify. Everyone can point to someone else when a researcher asks why their grant was cancelled.

This Is Not the First Time

The NEH case is the most visible example of AI-assisted government action so far in 2026, but it's not the first. The pattern of using LLMs to process large volumes of government data quickly - and accepting their outputs with minimal human review - has been visible across multiple DOGE actions since January.

Early in DOGE's operations, reports emerged that the group was using AI tools to scan federal employee emails and communications for terms associated with political disfavor. The same binary classification problem applies there: the model doesn't understand nuance, context, or intent. It pattern-matches on keywords.

At the Department of Education, AI tools were reportedly used to flag grant recipients and contractors for review based on language in their organizational charters and mission statements. At USAID before it was effectively shuttered, AI scanned program descriptions to identify projects associated with certain political categories.

Jan 2026

DOGE begins deploying AI tools to scan federal employee communications. Details remain unclear but reports indicate LLM-assisted keyword classification.

Feb 2026

Department of Education grant reviews reportedly use AI to flag programs with certain language in their descriptions. Pattern matches trigger reviews without human subject matter expert involvement.

Mar 3, 2026

DOGE personnel arrive at the National Endowment for the Humanities with a mandate to cancel DEI-adjacent grants.

Mar 5-6, 2026

ChatGPT used to classify NEH grant summaries with a 120-character binary prompt. Grants flagged as "yes" receive termination notices.

Mar 7, 2026

New York Times publishes details of the ChatGPT prompt and process. Researchers and legal groups begin organizing challenges.

Mar 9, 2026

Legal challenges in early formation. NEH researchers face uncertainty about ongoing project funding and institutional support.

The pattern is consistent: DOGE moves faster than institutional review processes can follow, uses AI to generate the appearance of systematic decision-making, and relies on the complexity of the situation to insulate the process from scrutiny. By the time anyone challenges a specific decision, the funding is already cancelled, the contracts already terminated, the employees already gone.

What Comes After the NEH

The NEH represents a relatively small and politically isolated target - humanities research doesn't have a large or well-organized constituency in the current political environment. But the operational template DOGE has demonstrated is not going to stay limited to small agencies.

If AI-assisted classification can cancel $207 million in humanities grants, the same approach can be applied to the National Institutes of Health's research portfolio - which runs to $47 billion. It can be applied to the National Science Foundation's grants, to Department of Energy research funding, to the entire apparatus of federally funded science. The limitation so far has been political will, not technical capability.

The implications extend beyond grant programs. Federal contracting, regulatory enforcement, benefits determinations, immigration adjudications - each of these involves large volumes of case-by-case decisions that currently require human expert review. They are all in principle susceptible to the same LLM-assisted acceleration that DOGE applied at the NEH.

"The question isn't whether AI will be used in government decision-making. It already is. The question is whether there will be any accountability structure around it before the damage becomes irreversible."

- Legal scholar commenting on the NEH situation

The parallel development to watch is the broader AI agent ecosystem. Current DOGE deployments appear to use LLMs in a relatively crude way - copy/paste a summary, get a yes/no answer. The next generation of AI agents can read documents directly, cross-reference databases, draft correspondence, and execute actions in software systems without any human intermediary at all. The NEH case used ChatGPT as a classification tool with a human hitting send on the termination notices. Future systems may not require the human in the middle.

The AI Industry's Uncomfortable Position

OpenAI, whose ChatGPT was apparently used in this process, has not commented publicly on DOGE's use of its product for grant cancellation decisions. This silence is itself revealing. The company recently signed a significant contract with the Department of Defense, has been navigating public controversy about military applications, and is under scrutiny from employees and researchers who quit over Pentagon deals. Adding "our product was used to gut humanities funding based on a 120-character prompt" to that list is not a comfortable addition.

The broader AI industry has consistently argued that its products are neutral tools that can be used for good or ill, and that responsibility for deployment decisions lies with users. This argument is legally and commercially convenient. It is also under increasing strain as specific, documented harms from AI deployments accumulate.

The NEH case presents a particular challenge to the neutral tools argument because the failure mode is technical, not just political. ChatGPT's tendency to over-trigger on DEI associations isn't a bug that a more careful user could have avoided - it is a predictable characteristic of how these models work. The limitations of binary classification on short, context-stripped text are well-documented in the AI research literature. Using ChatGPT for this task without accounting for these limitations isn't just poor judgment. It is a misuse of the tool that OpenAI's own documentation would advise against.

    What Responsible AI Deployment in Government Would Require
    Human expert review: Subject matter experts must evaluate AI outputs before consequential decisions are made, not after.
Confidence thresholds: Binary outputs are inappropriate for complex classification. Systems should flag uncertainty and route to human review.
Audit trails: Every AI-assisted decision should generate a log that records what input the model received, what it output, and who acted on it.
Appeals mechanisms: Parties affected by AI-assisted decisions must have access to a process where a human reviews the specific decision with access to the full record.
Validation testing: Before deployment in high-stakes contexts, systems should be tested against known cases to establish error rates.
Scope limitations: AI should flag for review, not trigger automatic action. The human decision remains separate from the AI output.

  

The Long-Term Cost Nobody Is Calculating

The immediate news story is about grants cancelled and researchers defunded. The longer-term story is harder to see but more consequential.

Humanities research produces infrastructure that the rest of scholarship depends on. Digitized archives that historians, journalists, and lawyers use. Oral history collections that provide primary source material for understanding communities and events. Translation of texts that make knowledge from other civilizations accessible to researchers across disciplines. Language documentation that preserves dying languages - knowledge lost when the last speaker dies cannot be recovered.

When a grant is cancelled mid-project, the damage is not limited to the dollar amount. Researchers lose jobs. Teams dissolve. Institutional knowledge disappears. Physical materials - documents, artifacts, recordings - may not survive if the project meant to preserve them runs out of funding. A digitization project halfway complete is worse than no project at all in some respects, because the work done represents sunk cost with no deliverable.

The NEH has been tracking the downstream impact of its grants for decades. A single successful humanities project can generate citations, derivative research, and cultural products for generations. The NEH itself estimates that every dollar of grant funding generates multiple dollars in economic activity through university employment, publication, and related activity. More importantly, it generates intellectual capital that is genuinely difficult to price but clearly real.

ChatGPT, operating on 200-word summaries with a 120-character output limit, made no calculation of any of this. It answered yes or no. The people running the process designed to find DEI and cut it appear not to have thought about what they were cutting along with it.

This is the second-order cost that AI-accelerated government action systematically ignores. Speed produces decisions. Decisions produce consequences. The consequences unfold over years and decades, long after the news cycle has moved on and the accountability has dispersed into a system where everyone can credibly claim they were just doing their job.

Where the Legal Battles Go From Here

The administrative and legal challenges to the NEH cancellations are in early formation as of this writing. Multiple angles are being explored by researchers, universities, and civil liberties organizations.

The APA challenge - that the cancellations were arbitrary and capricious because they were made without reasoned explanation based on actual review of the specific grants - is probably the strongest procedural argument. Courts have previously struck down agency actions that were conducted without adequate basis in fact or law, and using ChatGPT on internet summaries arguably fails the "reasoned explanation" test on its face.

A due process challenge is also likely, arguing that grant recipients had a property interest in their funding and were deprived of it without adequate notice or hearing. This is harder - courts have been somewhat deferential to government discretion in grant programs - but the scale and speed of the cancellations, and the manifest arbitrariness of the process, may attract judicial attention.

There are also First Amendment angles being explored by some legal groups. If grants were cancelled because their subject matter was disfavored based on political content - and using a chatbot prompt calibrated to political terms - that potentially enters territory courts have traditionally been protective of.

What is less clear is what remedy courts could order. The NEH under current leadership is unlikely to voluntarily restore cancelled grants. Injunctive relief requiring restoration while challenges proceed is possible but uncertain. And even if courts ultimately rule against the process, the practical damage - to research projects, to careers, to the institutional fabric of humanities scholarship - will be difficult to undo.

Get BLACKWIRE reports first.

Breaking news, investigations, and analysis - straight to your phone.

Join @blackwirenews on Telegram

The Precedent Being Set Right Now

Every time a government process uses AI to make consequential decisions without adequate oversight, and that process succeeds - meaning it isn't immediately overturned by courts or reversed by public pressure - it becomes the template for the next process.

DOGE's operations are being watched by government efficiency advocates, by foreign governments exploring similar approaches, and by the AI industry looking for deployment contexts. If the NEH cancellations stand, if the legal challenges fail or are delayed long enough to not matter, the lesson learned is that AI-assisted mass decision-making works. That lesson will be applied in contexts with far larger stakes.

The technical and procedural questions raised by the ChatGPT/NEH case don't have political valence. They apply equally regardless of which party is in power and which programs are being evaluated. An AI system that makes consequential decisions without adequate oversight is a problem whether it is being used to cut programs a conservative government disfavors or programs a progressive government disfavors. The issue is the absence of accountability in the process, not the political direction of the outcome.

What DOGE demonstrated at the NEH is that it is operationally feasible to use a commercial AI chatbot to process and terminate large numbers of government grants with minimal human review, in a timeframe that outpaces institutional response. That is not a political statement. It is a capabilities demonstration. And it will be replicated.

The question now is whether Congress, the courts, or the AI industry itself will establish guardrails before the next demonstration involves something larger than $207 million in humanities funding. The signs so far are not encouraging. Congress is not moving on AI governance at the speed AI is moving into governance. The courts work on timescales measured in years. The AI industry's financial incentives run in the direction of more deployment, not less.

For the researchers watching their funding disappear based on a chatbot's binary guess, the philosophical question about AI governance is less pressing than the immediate practical one: who do you appeal to when the decision was made by a model, executed by someone following orders, and signed off by an institution that has since moved on to the next agency on its list?

The answer, right now, is nobody. And that is exactly how this was designed to work.

← Back to BLACKWIRE