The responsibility in using GenAI for academic pursuits in higher education is shared between the user, the tool and, in instances where the tool is part of teaching and learning processes, the institution. As such, to say that students using ChatGPT as a research to bear sole responsibility for the accuracy of the information the tools provides is unethical and unjust. In this case, this is especially the case if the student is directed by an instructor to use the tool. It can be argued that the institution bears responsibility if it doesn’t provide instruction (digital literacy) on using the tools.
The anthropomorphism of GenAI writing and research tools mark their results differently from those of Google Scholar or Wikipedia, for example. GenAI, promoted as research and writing tools, bear equal and sometimes greater responsibility for not only the information they provide. These tools often position themselves within the limitations of their actions and the availability and accuracy of the data on which they draw, by providing caveats with their answers. At the same time, the anthropomorphic language that is used in providing these answers is convincing and authoritative. As a result, these tools have responsibility not only for the information they provide on the basis of its authoritative presentation. There a responsibility to those who use this information and the work that they produce as a result of the tool, especially in light of OpenAI’s own admission that ChatGPT “hallucinates” or makes up information. Continue reading “Is ChatGPT responsible for a student’s failing grade?: A hallucinogenic conversation”→
On July 20, 2023, OpenAI, the parent company of ChatGPT and Dall·E, stopped offering its GenAI detection tool, AI classifier, saying that it “is not fully reliable” (OpenAI, 2023). There’s a short statement on OpenAI’s website:
As of July 20, 2023, the AI classifier is no longer available due to its low rate of accuracy. We are working to incorporate feedback and are currently researching more effective provenance techniques for text, and have made a commitment to develop and deploy mechanisms that enable users to understand if audio or visual content is AI-generated. (Kirchner, Ahmad, Aaronson, & Leike, 2023, January 1; italics in the original)
There, OpenAI describes its AI classifier:
Our classifier is not fully reliable. In our evaluations on a “challenge set” of English texts, our classifier correctly identifies 26% of AI-written text (true positives) as “likely AI-written,” while incorrectly labeling human-written text as AI-written 9% of the time (false positives). (Kirchner, Ahmad, Aaronson, & Leike, 2023, January 1; emphasis in the original)
As a result of the launch of ChatGPT in November 2022, fundamental changes to higher education have happened, and continue to happen, quickly and with unforeseen consequences. “A US poll published March 2023, found that “43% of college students have used ChatGPT or a similar AI application” and 22% “say they have used them to help complete assignments or exams,” representing “1 in 5 college students” (Welding, 2023, March 27). Inside Higher Ed published a piece, The Oncoming AI Ed-Tech ‘Tsunami’, predicting “[t]he AI-in-education market is expected to grow from approximately $2 billion in 2022 to more than $25 billion in 2030, with North America accounting for the largest share” What was a relevant response for GenAI in December 2022 is now ancient history. The scene is fluid—there are few predictive models, and no one knows what might come next. (D’Agostino, 2023, April 18). A classroom instructor is quoted in May 2023: “AI has already changed the classroom into something I no longer recognize” (Bogost, 2023, May 16).
On April 4, 2023, Turnitin launched its AI detection tool (Chechitelli, 2023, March 16). At the time, Turnitin’s CEO, Chris Caren, wrote,
…we are pleased to announce the launch of our AI writing detection capabilities… To date, the statistical signature of AI writing tools remains detectable and consistently average. In fact, we are able to detect the presence of AI writing with confidence. We have been very careful to adjust our detection capabilities to minimize false positives and create a safe environment to evaluate student writing for the presence of AI-generated text. (Caren, 2023, April 4)
On April 3, the Washington Post, which had early access, tested the accuracy of Turnitin’s tool using 16 “samples of real, AI-fabricated and mixed-source essays.” It found the tool
…got over half of them at least partly wrong. Turnitin accurately identified six of the 16 — but failed on three… And I’d give it only partial credit on the remaining seven, where it was directionally correct but misidentified some portion of ChatGPT-generated or mixed-source writing. (Fowler, 2023, April 3)
At the same time, r/ChatGPT also provides information on how to skirt AI detection, which grows in sophistication. Tips appeared in May 2023 providing information on how to “pass Turnitin AI detection” using ChatGPT and Grammarly (Woodford, 2023, May 14). Students were using ChatGPT to fool the detection tools using prompts that turned ChatGPT into a ghost writer that mimicked student’s tone and voice. A student, interviewed by the New York Times, explained how they gave ChatGPT a sample of their writing, and asked ChatGPT
“…to rewrite this paragraph to make it sound like me…So, I copied [and] pasted a page of what I’d already written and then it rewrote that paragraph, & I was like, this works” (Tan, 2023, June 26).
AI detection tool will need to be taken “with a big grain of salt,” saying that, in the end, it is up to the instructor to “make the final interpretation” of what is created by GenAI and what isn’t—“You, the instructor, have to make the final interpretation”
Also in May, Turnitin began to provide caveats for its AI detection tool. David Adamson, an AI scientist and Turnitin employee, says in a Turnitin produced video, Understanding false positives within Turnitin’s AI writing detection capabilities, that instructors need to do some work when using the tool. He admits that the results of submissions to the AI detection tool will need to be taken “with a big grain of salt,” saying that, in the end, it is up to the instructor to “make the final interpretation” of what is created by GenAI and what isn’t—“You, the instructor, have to make the final interpretation” (Turnitin, 2023, May 23). These false positives, according to Adamson, have different “flavours.” These flavours are specific kinds of writing that Turnitin’s tool is not good at predicting as GenAI writing. These include:
Repetitive writing: the same words used again and again.
Lists, outlines, short questions, code, or poetry.
Developing writers, English-language learners, and those writing at middle and high school levels.
Adamson ends the video by saying, “we own our mistakes. We want to…share with you how and when we are wrong” (Turnitin, 2023, May 23). These mistakes, Adamson states, represent ~1%, or 1 in 100, of submissions through the tool.
If we use Adamson’s rate of 1% false positives, 3.5% of 38.5 million submission is 1.3 million—1% of 1.3 million is 13,000 student papers that were found to be written in part by GenAI, when in fact they were not.
While this may be acceptable to Turnitin, this 1% represents real student assignments, written by real students. By May 14, 2023, Turnitin reported that 38.5 million submissions had been submitted for examination by their GenAI detection tool, “with 9.6% of those documents reporting over 20% of AI writing and 3.5% over 80% of AI writing” (Merod, 2023, June 7). If we use Adamson’s rate of 1% false positives, 3.5% of 38.5 million submission is 1.3 million—1% of 1.3 million is 13,000 student papers that were found to be written in part by GenAI, when in fact they were not. If Adamson’s 1% false-positive rate is applied to the 9.6% of papers reported with over 20% of AI writing (3.8 million assignments), this total is about 37,000 assignments. Together, this is approximately 50,000 false positives affecting 50,000 students. For scale, two of Canada’s largest schools, the University of Alberta has a student population is 40,100 and York University, 55,700. For the students, being accused of an academic violation can not only affect their academic record, but cause anxiety, loss of scholarship, and cancellation of student visas. Turnitin’s AI scientist Adamson say that the 1% false positive rate is “pretty good…” (Turnitin, 2023, May 23).
The University of Pittsburgh and Vanderbilt University have decided to not use Turnitin’s tool. The University of Pittsburgh
has concluded that “current AI detection software is not yet reliable enough to be deployed without a substantial risk of false positives and the consequential issues such accusations imply for both students and faculty. Use of the detection tool at this time is simply not supported by the data and does not represent a teaching practice that we can endorse or support.” Because of this, the Teaching Center will disable the AI detection tool in Turnitin effective immediately (Teaching Center doesn’t endorse, 2023, June 23).
Vanderbilt indicated that they’d “decided to disable Turnitin’s AI detection tool for the foreseeable future. This decision was not made lightly and was made in pursuit of the best interests of our students and faculty,” due to Turnitin’s lack of transparency of how it works as well as the false positive rate (Coley, 2023, August 16). Vanderbilt also did the math regarding the impact on their students due to false positives:
Vanderbilt submitted 75,000 papers to Turnitin in 2022. If this AI detection tool was available then, around 3,000 student papers would have been incorrectly labeled as having some of it written by AI. Instances of false accusations of AI usage being leveled against students at other universities have been widely reported over the past few months, including multiple instances that involved Turnitin… In addition to the false positive issue, AI detectors have been found to be more likely to label text written by non-native English speakers as AI-written. (Coley, 2023, August 16).
Vanderbilt concluded, “we do not believe that AI detection software is an effective tool that should be used” (Coley, 2023, August 16).
Canadian higher education institutions have a mixed approach to AI detectors. On April 4, 2023, University of British Columbia acted quickly to Turnitin’s AI detection tool, stating that they will not enable it. Their reasoning, among others, includes: “Instructors cannot double-check the feature results”; “Results from the feature are not available to students”; and an inability “of the feature to keep up with rapidly evolving AI is unknown” (University of British Columbia, 2023, April 4).
Nipissing University, in their senate-approved, June 2023 “Generative AI Guide for Instructors” guide, mentions that “use of generative AI ‘detectors’ is not recommended” (n.p.); theUniversity of Waterloo similarly cautions faculty, “controlling the use of AI writing through surveillance or detection technology is not recommended” (Frequently Asked Questions, 2023, July 25). Conestoga College in its guide to using Turnitin’s tool, instructs faculty to indicate that they are using the tool; “Without such notice, a student may at some point appeal.” (Sharpe, 2023, June 27).
A paper published this month in the International Journal for Educational Integrity, “Evaluating the efficacy of AI content detection tools in differentiating between human and AI‑generated text,” which did not include Turnitin, found the “performance” of AI detection tools on GPT 4-generated content was “notably less consistent” in differentiating between human and AI-written text (Elkhatat, Elsaid, & Almeer, 2023, p. 6). “Overall, the tools struggled more with accurately identifying GPT 4-generated content than GPT 3.5-generated content” (p. 8). The findings of this study should raise questions about using GenAI detection tools in higher education:
While this study indicates that AI-detection tools can distinguish between human and AI-generated content to a certain extent, their performance is inconsistent and varies depending on the sophistication of the AI model used to generate the content. This inconsistency raises concerns about the reliability of these tools, especially in high-stakes contexts such as academic integrity investigations. (p. 12-13)
A conclusion of the paper advising “the varying performance [of detection tools on ChatGPT 3.5 and ChatGPT 4] underscores the intricacies involved in distinguishing between AI and human-generated text and the challenges that arise with advancements in AI text generation capabilities” (p. 14).
According to Turnitin, students who are English-language learners, developing writers, or a secondary-level of academic writing are at a higher risk of false positives from their tool. Adamson admits that Turnitin’s false positive rate is “slightly higher” for these students—“Still near our 1% target, but there is a difference”
International students take the brunt, again
In higher education, it is well documented that international students, who make up “approximately 17% of all post-secondary enrollments in Canada” (Shokirova, et al., 2023, August 23), are accused of academic integrity breaches at a higher rate than domestic students (See for example, Adhikari, 2018; The complex problem…, 2019; Eaton & Hughes, 2022; Fass-Holmes, 2017; Hughes & Eaton, 2022). As we see in writing centres, undergraduate international students are often English-language learners, many of whom are writing academic papers in English at post-secondary levels for the first time. As a result, many undergraduate international students’ level of writing in academic English is low. Some students that I have tutored take several years of writing practice to attain a level of academic writing many in the academy consider “polished” or at post-secondary levels.
According to Turnitin, students who are English-language learners, developing writers, or a secondary-level of academic writing are at a higher risk of false positives from their tool. Adamson admits that Turnitin’s false positive rate is “slightly higher” for these students—“Still near our 1% target, but there is a difference” (Turnitin, 2023, May 23). At the same time, Adamson also claims that Turnitin doesn’t see “any evidence” that the tool is “biased against English language learners from any country at any level” (Turnitin, 2023, May 23). Unfortunately, I was not able to find data published by Turnitin to substantiate these claims, including what the difference in the false-positive rate for these students is: What does Turnitin consider “near” their “1% target”? Is it 2%, 3.5%, 1.5%? Considering the large numbers involved, 38.5 million as of May 2023, even a 0.5% increase is significant.
What will happen in September?
Like the winter semester of 2023, it may well be that the first assignments submitted this month will start another round of changes to academic writing, academic integrity, and students’ use of Gen AI tools. It will be important for institutions to monitor and update their policies and procedures regarding AI detection tools, like Turnitin, in response to possible changes to GenAI writing. International students should be paid specific attention in these cases, as they are already vulnerable within higher education.
Eaton, S. E., & Hughes, J. C. (2022). Academic Integrity in Canada. In S. E. Eaton (Ed.) Ethics and Integrity in Educational Contexts, (Vol. 1, pp. xi-xvii). Sprinter. https://doi.org/10.1007/978-3-030-83255-1
Elkhatat, A. M., Elsaid, K., & Almeer, S. (2023). Evaluating the efficacy of AI content detection tools in differentiating between human and AI-generated text. International Journal for Educational Integrity, 19(17). https://doi.org/10.1007/s40979-023-00140-5
Fass-Holmes, B. (2017). International students reported for academic integrity violations: Demographics, retention, and graduation. Journal of International Students, 7(3), 644–669. https://doi.org/10.5281/zenodo.570026
Hughes, J. C., & Eaton, S. E. (2022). Student integrity violations in the academy: More than a Decade of growing complexity and concern. In S. E. Eaton & J. C. Hughes (Eds.), Ethics and Integrity in Educational Contexts (Vol. 1, pp. 61–79). Springer. https://doi.org/10.1007/978-3-030-83255-1_3
On August 31, 2021, OpenAI posted to their website, Teaching with AI, described as a guide “to accelerate student learning” using ChatGPT. This guide provides prompts to “help educators get started with” ChatGPT. These include prompts for lesson-planning development, creating analogies and explanations, helping “students learn by teaching,” as well as creating “an AI tutor.”