Academic writing has completely changed: Turnitin forges ahead

Vol. 5, No. 3 (Fall 2023)

By Brian Hotson, Editor, CWCR/RCCR


On July 20, 2023, OpenAI, the parent company of ChatGPT and Dall·E, stopped offering its GenAI detection tool, AI classifier, saying that it “is not fully reliable” (OpenAI, 2023). There’s a short statement on OpenAI’s website:

As of July 20, 2023, the AI classifier is no longer available due to its low rate of accuracy. We are working to incorporate feedback and are currently researching more effective provenance techniques for text, and have made a commitment to develop and deploy mechanisms that enable users to understand if audio or visual content is AI-generated. (Kirchner, Ahmad, Aaronson, & Leike, 2023, January 1; italics in the original)

There, OpenAI describes its AI classifier:

Our classifier is not fully reliable. In our evaluations on a “challenge set” of English texts, our classifier correctly identifies 26% of AI-written text (true positives) as “likely AI-written,” while incorrectly labeling human-written text as AI-written 9% of the time (false positives). (Kirchner, Ahmad, Aaronson, & Leike, 2023, January 1; emphasis in the original)

There was a lot of handwringing in response among the tech-class online: The Verge said, OpenAI can’t tell if something was written by AI after all and PC World, OpenAI’s ChatGPT is too good for its own AI to detect. Considering the impact of OpenAI’s announcement, there was little to no other coverage. Meanwhile, Turnitin, valued at $1.75 billion USD in 2019, continues to offer GenAI writing detection.

All change

As a result of the launch of ChatGPT in November 2022, fundamental changes to higher education have happened, and continue to happen, quickly and with unforeseen consequences. “A US poll published March 2023, found that “43% of college students have used ChatGPT or a similar AI application” and 22% “say they have used them to help complete assignments or exams,” representing  “1 in 5 college students” (Welding, 2023, March 27). Inside Higher Ed published a piece, The Oncoming AI Ed-Tech ‘Tsunami’, predicting “[t]he AI-in-education market is expected to grow from approximately $2 billion in 2022 to more than $25 billion in 2030, with North America accounting for the largest share” What was a relevant response for GenAI in December 2022 is now ancient history. The scene is fluid—there are few predictive models, and no one knows what might come next. (D’Agostino, 2023, April 18). A classroom instructor is quoted in May 2023: “AI has already changed the classroom into something I no longer recognize” (Bogost, 2023, May 16).

On April 4, 2023, Turnitin launched its AI detection tool (Chechitelli, 2023, March 16). At the time, Turnitin’s CEO, Chris Caren, wrote,

…we are pleased to announce the launch of our AI writing detection capabilities… To date, the statistical signature of AI writing tools remains detectable and consistently average. In fact, we are able to detect the presence of AI writing with confidence. We have been very careful to adjust our detection capabilities to minimize false positives and create a safe environment to evaluate student writing for the presence of AI-generated text. (Caren, 2023, April 4)

On April 3, the Washington Post, which had early access, tested the accuracy of Turnitin’s tool using 16 “samples of real, AI-fabricated and mixed-source essays.” It found the tool

…got over half of them at least partly wrong. Turnitin accurately identified six of the 16 — but failed on three… And I’d give it only partial credit on the remaining seven, where it was directionally correct but misidentified some portion of ChatGPT-generated or mixed-source writing. (Fowler, 2023, April 3)

Pieces in Rolling Stone, The Atlantic, and USA Today found similar results. Postings to r/ChatGPT began to appear with accounts by students claiming to be falsely accused as a result of Turnitin’s tool by their instructors of using GenAI to write their papers.

What about the backdoor?

At the same time, r/ChatGPT also provides information on how to skirt AI detection, which grows in sophistication. Tips appeared in May 2023 providing information on how to “pass Turnitin AI detection” using ChatGPT and Grammarly (Woodford, 2023, May 14). Students were using ChatGPT to fool the detection tools using prompts that turned ChatGPT into a ghost writer that mimicked student’s tone and voice. A student, interviewed by the New York Times, explained how they gave ChatGPT a sample of their writing, and asked ChatGPT

“…to rewrite this paragraph to make it sound like me…So, I copied [and] pasted a page of what I’d already written and then it rewrote that paragraph, & I was like, this works” (Tan, 2023, June 26).

Online web tools began to appear, such as UndetectableAI, HideMyAI, and QuillBot, designed specifically to fool GenAI detection tools.

AI detection tool will need to be taken “with a big grain of salt,” saying that, in the end, it is up to the instructor to “make the final interpretation” of what is created by GenAI and what isn’t—“You, the instructor, have to make the final interpretation”

Screen shot of UndetectableAI.

Also in May, Turnitin began to provide caveats for its AI detection tool. David Adamson, an AI scientist and Turnitin employee, says in a Turnitin produced video, Understanding false positives within Turnitin’s AI writing detection capabilities, that instructors need to do some work when using the tool. He admits that the results of submissions to the AI detection tool will need to be taken “with a big grain of salt,” saying that, in the end, it is up to the instructor to “make the final interpretation” of what is created by GenAI and what isn’t—“You, the instructor, have to make the final interpretation” (Turnitin, 2023, May 23). These false positives, according to Adamson, have different “flavours.” These flavours are specific kinds of writing that Turnitin’s tool is not good at predicting as GenAI writing. These include:

  • Repetitive writing: the same words used again and again.
  • Lists, outlines, short questions, code, or poetry.
  • Developing writers, English-language learners, and those writing at middle and high school levels.

Adamson ends the video by saying, “we own our mistakes. We want to…share with you how and when we are wrong” (Turnitin, 2023, May 23). These mistakes, Adamson states, represent ~1%, or 1 in 100, of submissions through the tool.

If we use Adamson’s rate of 1% false positives, 3.5% of 38.5 million submission is 1.3 million—1% of 1.3 million is 13,000 student papers that were found to be written in part by GenAI, when in fact they were not.

While this may be acceptable to Turnitin, this 1% represents real student assignments, written by real students. By May 14, 2023, Turnitin reported that 38.5 million submissions had been submitted for examination by their GenAI detection tool, “with 9.6% of those documents reporting over 20% of AI writing and 3.5% over 80% of AI writing” (Merod, 2023, June 7). If we use Adamson’s rate of 1% false positives, 3.5% of 38.5 million submission is 1.3 million—1% of 1.3 million is 13,000 student papers that were found to be written in part by GenAI, when in fact they were not. If Adamson’s 1% false-positive rate is applied to the 9.6% of papers reported with over 20% of AI writing (3.8 million assignments), this total is about 37,000 assignments. Together, this is approximately 50,000 false positives affecting 50,000 students. For scale, two of Canada’s largest schools, the University of Alberta has a student population is 40,100 and York University, 55,700. For the students, being accused of an academic violation can not only affect their academic record, but cause anxiety, loss of scholarship, and cancellation of student visas. Turnitin’s AI scientist Adamson say that the 1% false positive rate is “pretty good…” (Turnitin, 2023, May 23).

Turnitoff?

The University of Pittsburgh and Vanderbilt University have decided to not use Turnitin’s tool. The University of Pittsburgh

has concluded that “current AI detection software is not yet reliable enough to be deployed without a substantial risk of false positives and the consequential issues such accusations imply for both students and faculty. Use of the detection tool at this time is simply not supported by the data and does not represent a teaching practice that we can endorse or support.” Because of this, the Teaching Center will disable the AI detection tool in Turnitin effective immediately (Teaching Center doesn’t endorse, 2023, June 23).

Vanderbilt indicated that they’d “decided to disable Turnitin’s AI detection tool for the foreseeable future. This decision was not made lightly and was made in pursuit of the best interests of our students and faculty,” due to Turnitin’s lack of transparency of how it works as well as the false positive rate (Coley, 2023, August 16). Vanderbilt also did the math regarding the impact on their students due to false positives:

Vanderbilt submitted 75,000 papers to Turnitin in 2022. If this AI detection tool was available then, around 3,000 student papers would have been incorrectly labeled as having some of it written by AI. Instances of false accusations of AI usage being leveled against students at other universities have been widely reported over the past few months, including multiple instances that involved Turnitin… In addition to the false positive issue, AI detectors have been found to be more likely to label text written by non-native English speakers as AI-written. (Coley, 2023, August 16).

Vanderbilt concluded, “we do not believe that AI detection software is an effective tool that should be used” (Coley, 2023, August 16).

Canadian higher education institutions have a mixed approach to AI detectors. On April 4, 2023, University of British Columbia acted quickly to Turnitin’s AI detection tool, stating that they will not enable it. Their reasoning, among others, includes: “Instructors cannot double-check the feature results”;Results from the feature are not available to students”; and an inability “of the feature to keep up with rapidly evolving AI is unknown” (University of British Columbia, 2023, April 4).

Nipissing University, in their senate-approved, June 2023 “Generative AI Guide for Instructorsguide, mentions that “use of generative AI ‘detectors’ is not recommended” (n.p.); the University of Waterloo similarly cautions faculty, “controlling the use of AI writing through surveillance or detection technology is not recommended” (Frequently Asked Questions, 2023, July 25). Conestoga College in its guide to using Turnitin’s tool, instructs faculty to indicate that they are using the tool; “Without such notice, a student may at some point appeal.” (Sharpe, 2023, June 27).

Others, such as the University of Lethbridge and Queen’s University, use Turnitin without caveats specific to the AI detection tool that I could find on their public-facing website at the time of writing.

What is the literature saying?

A paper published this month in the International Journal for Educational Integrity, “Evaluating the efficacy of AI content detection tools in differentiating between human and AI‑generated text,” which did not include Turnitin, found the “performance” of AI detection tools[1] on GPT 4-generated content was “notably less consistent” in differentiating between human and AI-written text (Elkhatat, Elsaid, & Almeer, 2023, p. 6). “Overall, the tools struggled more with accurately identifying GPT 4-generated content than GPT 3.5-generated content” (p. 8). The findings of this study should raise questions about using GenAI detection tools in higher education:

While this study indicates that AI-detection tools can distinguish between human and AI-generated content to a certain extent, their performance is inconsistent and varies depending on the sophistication of the AI model used to generate the content. This inconsistency raises concerns about the reliability of these tools, especially in high-stakes contexts such as academic integrity investigations. (p. 12-13)

A conclusion of the paper advising “the varying performance [of detection tools on ChatGPT 3.5 and ChatGPT 4] underscores the intricacies involved in distinguishing between AI and human-generated text and the challenges that arise with advancements in AI text generation capabilities” (p. 14).

According to Turnitin, students who are English-language learners, developing writers, or a secondary-level of academic writing are at a higher risk of false positives from their tool. Adamson admits that Turnitin’s false positive rate is “slightly higher” for these students—“Still near our 1% target, but there is a difference”

International students take the brunt, again

In higher education, it is well documented that international students, who make up “approximately 17% of all post-secondary enrollments in Canada” (Shokirova, et al., 2023, August 23), are accused of academic integrity breaches at a higher rate than domestic students (See for example, Adhikari, 2018; The complex problem…, 2019; Eaton & Hughes, 2022; Fass-Holmes, 2017; Hughes & Eaton, 2022). As we see in writing centres, undergraduate international students are often English-language learners, many of whom are writing academic papers in English at post-secondary levels for the first time. As a result, many undergraduate international students’ level of writing in academic English is low. Some students that I have tutored take several years of writing practice to attain a level of academic writing many in the academy consider “polished” or at post-secondary levels.

According to Turnitin, students who are English-language learners, developing writers, or a secondary-level of academic writing are at a higher risk of false positives from their tool. Adamson admits that Turnitin’s false positive rate is “slightly higher” for these students—“Still near our 1% target, but there is a difference” (Turnitin, 2023, May 23). At the same time, Adamson also claims that Turnitin doesn’t see “any evidence” that the tool is “biased against English language learners from any country at any level” (Turnitin, 2023, May 23). Unfortunately, I was not able to find data published by Turnitin to substantiate these claims, including what the difference in the false-positive rate for these students is: What does Turnitin consider “near” their “1% target”? Is it 2%, 3.5%, 1.5%? Considering the large numbers involved, 38.5 million as of May 2023, even a 0.5% increase is significant.

What will happen in September?

Like the winter semester of 2023, it may well be that the first assignments submitted this month will start another round of changes to academic writing, academic integrity, and students’ use of Gen AI tools. It will be important for institutions to monitor and update their policies and procedures regarding AI detection tools, like Turnitin, in response to possible changes to GenAI writing. International students should be paid specific attention in these cases, as they are already vulnerable within higher education.


References

Adhikari, S. (2018). Beyond culture: Helping international students avoid plagiarism. Journal of International Students, 8(1), 375–388. https://doi.org/10.5281/zenodo.1134315

Bogost, I. (2023, May 16). The First Year of AI College Ends in Ruin. The Atlantic. https://www.theatlantic.com/technology/archive/2023/05/chatbot-cheating-college-campuses/674073/

Caren, C. (2023, April 4). The launch of Turnitin’s AI writing detector and the road ahead. Turnitin. https://www.turnitin.com/blog/the-launch-of-turnitins-ai-writing-detector-and-the-road-ahead

Chechitelli, A. (2023, March 16). Understanding false positives within our AI writing detection capabilities. Turnitin. https://www.turnitin.com/blog/understanding-false-positives-within-our-ai-writing-detection-capabilities

Coley, M. (2023, August 16). Guidance on AI detection and why we’re disabling Turnitin’s AI detector. Vanderbilt University. https://www.vanderbilt.edu/brightspace/2023/08/16/guidance-on-ai-detection-and-why-were-disabling-turnitins-ai-detector/

The complex problem of academic dishonesty among international students – Study International. (2019, March 25). International Study. https://www.studyinternational.com/news/the-complex-problem-of-academic-dishonesty-among-international-students/

D’Agostino, S. (2023, April 18). The Oncoming AI Ed-Tech ‘Tsunami’. Inside Higher Ed. https://www.insidehighered.com/news/tech-innovation/artificial-intelligence/2023/04/18/oncoming-ai-ed-tech-tsunami

Eaton, S. E., & Hughes, J. C. (2022). Academic Integrity in Canada. In S. E. Eaton (Ed.) Ethics and Integrity in Educational Contexts, (Vol. 1, pp. xi-xvii). Sprinter. https://doi.org/10.1007/978-3-030-83255-1

Elkhatat, A. M., Elsaid, K., & Almeer, S. (2023). Evaluating the efficacy of AI content detection tools in differentiating between human and AI-generated text. International Journal for Educational Integrity, 19(17). https://doi.org/10.1007/s40979-023-00140-5

Fass-Holmes, B. (2017). International students reported for academic integrity violations: Demographics, retention, and graduation. Journal of International Students, 7(3), 644–669. https://doi.org/10.5281/zenodo.570026 

Frequently Asked Questions: ChatGPT and generative AI in teaching and learning at the University of Waterloo. (2023, July 25). Associate-Vice President, Academic, University of Waterloo. https://uwaterloo.ca/associate-vice-president-academic/frequently-asked-questions-chatgpt-and-generative-ai

Hughes, J. C., & Eaton, S. E. (2022). Student integrity violations in the academy: More than a Decade of growing complexity and concern. In S. E. Eaton & J. C. Hughes (Eds.), Ethics and Integrity in Educational Contexts (Vol. 1, pp. 61–79). Springer. https://doi.org/10.1007/978-3-030-83255-1_3

Johnson, S. (2019, March 1). Turnitin to Be Acquired by Advance Publications for $1.75B. EdSurge. https://www.edsurge.com/news/2019-03-06-turnitin-to-be-acquired-by-advance-publications-for-1-75b

Kirchner, J., Ahmad, L., Aaronson, S., & Leike, J. (2023, January 1). New AI classifier for indicating AI-written text. OpenAI. https://openai.com/blog/new-ai-classifier-for-indicating-ai-written-text

Sharpe, A. (2023, June 27). Using Turnitin’s Artificial Intelligence (AI) Detection Tool and the Process Guide for Navigating Potential Academic Offences. Faculty learning Hub, Conestoga College. https://tlconestoga.ca/using-turnitins-artificial-intelligence-ai-detection-tool-and-the-process-guide-for-navigating-potential-academic-offences/

Shokirova, T., Brunner, L. R., Kishor Karki, K., Coustere, C., & Valizadeh, N. (2023, August 23). Reinventing the reception of students from abroad in graduate studies. University Affairs. https://www.affairesuniversitaires.ca/opinion/a-mon-avis/reinventer-laccueil-des-etudiant-e-s-de-letranger-aux-cycles-superieurs/

Tan, S. (2023, June 26). Suspicion, Cheating and Bans: A.I. Hits America’s Schools. The Daily, New York Times. https://www.nytimes.com/2023/06/28/podcasts/the-daily/ai-chat-gpt-schools.html?searchResultPosition=1

Tea Teaching Center doesn’t endorse any generative AI detection tools. (2023, June 23). University Times, University of Pittsgurgh. https://www.utimes.pitt.edu/news/teaching-center-doesn-t

University of British Columbia. (2023, April 4). UBC not enabling Turnitin’s AI-detection feature. https://lthub.ubc.ca/2023/04/04/ubc-not-enabling-turnitins-ai-detection/

Welding, L. (2023, March 27). Half of college students say using ai on schoolwork is cheating or plagiarism. BestColleges. https://www.bestcolleges.com/research/college-students-ai-tools-survey/

Woodford, A. (2023, May 14). Can Turnitin detect ChatGPT? ChatGPT Prompts. https://www.chatgpt-prompts.net/can-turnitin-detect-chatgpt/

[1] The detection tools in the study were OpenAI, Writer, Copyleaks, GPTZero, and CrossPlag.