AI Role in Education

|

Opinion

Why companies lie about detecting AI writing

Dec 13, 2023

Sohan Choudhury headshot
Sohan Choudhury headshot

Sohan Choudhury | sohan@flintk12.com

Sinister AI detection company overseeing a classroom
Sinister AI detection company overseeing a classroom
Sinister AI detection company overseeing a classroom

Illustrated by AI, written by humans.

Originality in writing has been completely eroded within a year. ChatGPT was the first AI chat interface to gain widespread consumer adoption, and did so on an unbelievable scale. Almost immediately, it became clear to 100 million users that one of the biggest strengths of large language models was for writing.

ChatGPT lets anyone write anything, it seems. Now, with more powerful models like GPT-4, Claude, and Gemini being released, the quality of AI-generated writing has improved substantially.

While this makes LLMs phenomenal productivity assistants, the threat to education is clear. If AI trivializes writing, how do teachers effectively teach and assess students growing up in a world where their term paper is just a ChatGPT prompt away?

The tempting logical conclusion is to find AI’s kryptonite — a way to detect AI writing. Unfortunately, AI detection is a fool’s errand.

AI detection doesn’t work, and never will

AI detection doesn’t work, and it never will. That’s the harsh, ugly truth. Anyone who says otherwise is misinformed or trying to sell you a product that doesn’t work. More on that later.

The evidence for this truth is easy to find and is blatantly clear to anyone who has looked into the issue. “AI writing detectors” have high false positive rates that are not transparently shown, and disproportionately flag writing from non-native speakers as AI-generated — which raises serious ethical concerns.

Perhaps the best evidence for this truth is that OpenAI, the creator of ChatGPT and the industry leader in AI models, discontinued their own AI detection tool with the notice:

“As of July 20, 2023, the AI classifier is no longer available due to its low rate of accuracy.”

If the company that has built the most powerful foundational AI models can’t detect the output that its own AI produces, what chance does a company working without that expertise or proprietary information have?

You don’t have to guess. Two of the biggest “AI writing detectors” have both been proven to be inaccurate. ZeroGPT was shown to have a false positive rate of 20%, and TurnItIn walked back claims of accuracy after teacher backlash and has not published full accuracy results or allowed independent researchers to evaluate their software.

A September 2023 study from the International Journal for Education Integrity produced the following two bar graphs. The first shows the diagnostic accuracy of AI detectors given writing written by ChatGPT-4 as having a wide variance in detection ability. The second shows the diagnostic accuracy of AI detectors given human writing containing alarming percentages of false positives. The study concluded that there was "considerable variability” in the detectors‘ efficacy and that the tools’ “inconsistent performance and dependence on the sophistication of the AI models” discourages their exclusive use for enforcing academic integrity.

Diagnostic accuracy of AI detectors given writing written by ChatGPT-4 showing wide variance in detection ability.Diagnostic accuracy of AI detectors given writing by humans showing high false positive rates.

The fundamental, unsolvable engineering challenge here is that the creators of AI models (OpenAI, Anthropic, etc.) can’t watermark the output of models with detectable patterns, and third parties are not able to reliably detect signatures of AI writing. As AI models continue to improve, the writing they produce will reflect creativity, variety, and structure indistinguishable from human writing. This means that the accuracy of current detection methods will rapidly get worse, not better.

So, why lie?

AI detection doesn’t work, and it never will. That’s the harsh, ugly truth. Anyone who says otherwise is misinformed or trying to sell you a product that doesn’t work. More on that later.

The evidence for this truth is easy to find and is blatantly clear to anyone who has looked into the issue. “AI writing detectors” have high false positive rates that are not transparently shown, and disproportionately flag writing from non-native speakers as AI-generated — which raises serious ethical concerns.

Perhaps the best evidence for this truth is that OpenAI, the creator of ChatGPT and the industry leader in AI models, discontinued their own AI detection tool with the notice:

“As of July 20, 2023, the AI classifier is no longer available due to its low rate of accuracy.”

If the company that has built the most powerful foundational AI models can’t detect the output that its own AI produces, what chance does a company working without that expertise or proprietary information have?

You don’t have to guess. Two of the biggest “AI writing detectors” have both been proven to be inaccurate. ZeroGPT was shown to have a false positive rate of 20%, and TurnItIn walked back claims of accuracy after teacher backlash and has not published full accuracy results or allowed independent researchers to evaluate their software.

A September 2023 study from the International Journal for Education Integrity produced the following two bar graphs. The first shows the diagnostic accuracy of AI detectors given writing written by ChatGPT-4 as having a wide variance in detection ability. The second shows the diagnostic accuracy of AI detectors given human writing containing alarming percentages of false positives. The study concluded that there was "considerable variability” in the detectors‘ efficacy and that the tools’ “inconsistent performance and dependence on the sophistication of the AI models” discourages their exclusive use for enforcing academic integrity.

Diagnostic accuracy of AI detectors given writing written by ChatGPT-4 showing wide variance in detection ability.Diagnostic accuracy of AI detectors given writing by humans showing high false positive rates.

The fundamental, unsolvable engineering challenge here is that the creators of AI models (OpenAI, Anthropic, etc.) can’t watermark the output of models with detectable patterns, and third parties are not able to reliably detect signatures of AI writing. As AI models continue to improve, the writing they produce will reflect creativity, variety, and structure indistinguishable from human writing. This means that the accuracy of current detection methods will rapidly get worse, not better.

So, why lie?

AI detection doesn’t work, and it never will. That’s the harsh, ugly truth. Anyone who says otherwise is misinformed or trying to sell you a product that doesn’t work. More on that later.

The evidence for this truth is easy to find and is blatantly clear to anyone who has looked into the issue. “AI writing detectors” have high false positive rates that are not transparently shown, and disproportionately flag writing from non-native speakers as AI-generated — which raises serious ethical concerns.

Perhaps the best evidence for this truth is that OpenAI, the creator of ChatGPT and the industry leader in AI models, discontinued their own AI detection tool with the notice:

“As of July 20, 2023, the AI classifier is no longer available due to its low rate of accuracy.”

If the company that has built the most powerful foundational AI models can’t detect the output that its own AI produces, what chance does a company working without that expertise or proprietary information have?

You don’t have to guess. Two of the biggest “AI writing detectors” have both been proven to be inaccurate. ZeroGPT was shown to have a false positive rate of 20%, and TurnItIn walked back claims of accuracy after teacher backlash and has not published full accuracy results or allowed independent researchers to evaluate their software.

A September 2023 study from the International Journal for Education Integrity produced the following two bar graphs. The first shows the diagnostic accuracy of AI detectors given writing written by ChatGPT-4 as having a wide variance in detection ability. The second shows the diagnostic accuracy of AI detectors given human writing containing alarming percentages of false positives. The study concluded that there was "considerable variability” in the detectors‘ efficacy and that the tools’ “inconsistent performance and dependence on the sophistication of the AI models” discourages their exclusive use for enforcing academic integrity.

Diagnostic accuracy of AI detectors given writing written by ChatGPT-4 showing wide variance in detection ability.Diagnostic accuracy of AI detectors given writing by humans showing high false positive rates.

The fundamental, unsolvable engineering challenge here is that the creators of AI models (OpenAI, Anthropic, etc.) can’t watermark the output of models with detectable patterns, and third parties are not able to reliably detect signatures of AI writing. As AI models continue to improve, the writing they produce will reflect creativity, variety, and structure indistinguishable from human writing. This means that the accuracy of current detection methods will rapidly get worse, not better.

So, why lie?

Why do companies lie about AI writing detection?

The job of any company is to make a product that the market wants — to create a supply that addresses demand. The most innovative companies create products to meet emerging demands. AI writing detection is a massive emerging demand due to the massive emerging problem of students using AI to write.

Between schools and universities, there are millions of potential customers for AI writing detectors. The promise of a new multi-billion dollar market is incredibly tempting, especially if your target customers are educators who have an immediate need and lack the bandwidth to evaluate rapidly changing technology that is barely a year old.

Two-faced corporate representative marketing technology with questionable ethics and lack of transparency driven by evil monetary greed

Beyond the obvious — albeit clearly unethical — opportunity to sell fundamentally broken tools to buyers desperate for a solution, there’s a strong incentive for some companies to maintain the status quo.

Turnitin, for example, has spent over 25 years addressing plagiarism via similarity detection which is powered by internet sources and a massive trove of private submission data from over 30 million students. Seemingly overnight, due to generative AI, similarity detection became useless for preventing cheating. Instead of plagiarizing, students can generate entirely novel untraceable writing using ChatGPT in seconds.

If similarity detection no longer helps to ensure academic honesty, why should institutions continue to pay for Turnitin? This is an existential business risk for Turnitin and explains their desperate yet inevitably doomed attempts to introduce AI detection into their products.

Ultimately, we all want to believe that AI writing detection is possible because we’re afraid of change. We know that if AI writing can’t be detected, the future of teaching writing is in jeopardy. We’re too scared to rip off the band-aid, which makes us susceptible to deception from those who stand to profit by misinforming.

What should teachers do?

The only viable option for teachers is to embrace this change, as disruptive as it may be.

First, it’s important to recognize the reality that we’re in. Students will use AI to write, and there’s no way to detect if they do.

While this is concerning in terms of ensuring academic honesty, it’s important to remember that cheating on writing has always been a problem — in fact, the popularity of contract cheating through essay mills means that by 2017, over 31 million students had paid someone to write their paper for them. Before AI, 20,000 people in Kenya earned a living writing essays full-time.

What’s changed in the last year, then, is that cheating has now become more democratized because students can now cheat on essays for free and instantaneously thanks to generative AI.

Student collaborating with AI to write and edit

Banning AI may be just as tempting as trying to detect AI writing, but it’s also not a realistic solution — especially when AI writing functionality is becoming embedded into everyday tools like Microsoft Word and Google Docs.

It’s important to understand that students use AI for writing because these tools are inherently valuable. For students, AI writing:

  1. Helps them get a better grade

  2. Helps them improve their writing

While #1 is a concern for fair and accurate assessment, there’s potential for real educational value in #2. If students are using AI to get more immediate and tailored feedback, we should dig in here to understand how to best use AI to systematically improve student writing skills.

The future of learning to write

Learning to write will always be important. Generative AI is not to writing what the calculator was to math. I love Dan Willingham’s rebuttal of the analogy where he explains how, unlike with math, the content and process in writing will always impact the outcome:

So, how do educators continue to teach and hone the skill of writing given the advent of generative AI?

In the long run, how we assess writing has to change to become more adaptive to every student, and conversational assignments seem like a step in the right direction — especially as a replacement for standard written responses.

For essay writing, humanities teachers have been using AI to provide students with much more immediate and tailored feedback on their writing, thereby removing a key bottleneck in the learning process.

By having open conversations with students about their use of AI in writing, teachers can set guidelines around appropriate use and start to figure out what the future of learning to write will look like for their students.

And, here at Flint, we’re committed to never building or selling AI writing detectors.

Spark AI-powered learning at your school.

Sign up to start using Flint, free for up to 80 users.

Watch the video

Spark AI-powered learning at your school.

Sign up to start using Flint, free for up to 80 users.

Watch the video

Spark AI-powered learning at your school.

Sign up to start using Flint, free for up to 80 users.

Watch the video