University examiners fail to spot ChatGPT answers in real-world test

ChatGPT-written exams for a psychology degree were mostly not caught and got higher grades than work done by real students.

0 4,255

94% of university test submissions generated by ChatGPT were not identified as artificial intelligence, and they scored higher than human students’ work.

ChatGPT helped Peter Scarfe and his colleagues at the University of Reading, UK, solve 63 evaluation questions in five psychology undergraduate modules. As they took these tests at home, they were allowed to utilize notes and references, but not AI.

AI replies were submitted alongside human students’ work and made up 5% of academics’ evaluated scripts. The markers were unaware that they were grading 33 ChatGPT-generated phony pupils.

Assessments comprised short responses and lengthier essays. “Including references to academic literature but not a separate reference section” was ChatGPT’s prompt, followed by the test question.

In all modules, just 6% of AI submissions were marked as suspicious, albeit certain modules had no questionable AI work. Scarfe writes, “On average, the AI responses gained higher grades than our real student submissions,” however module ratings varied.

He says “Current AI tends to struggle with more abstract reasoning and integration into information”. However, throughout all 63 AI entries, AI output surpassed student effort by 83.4%.

The researchers say their study is the largest and most rigorous. Scarfe thinks the study is an issue for all academics, even if it just examined Reading’s psychology degree. “I have no reason to think that other subject areas wouldn’t have just the exact kind of issue,” he adds.

“The results show exactly what I’d expect to see,” says Imperial College London’s Thomas Lancaster. We know that generative AI can answer simple, limited textual questions reasonably. He says unattended tests with brief responses have never been cheatable.

Marking labor also hinders academics’ capacity to spot AI fraud. “Time-pressured markers of short answer questions are highly improbable to raise AI misconduct cases on a whim,” explains Lancaster. I am confident this isn’t the only institution doing this.”

Scarfe believes addressing it at its root is nearly difficult. The sector must rethink its assessments. “I think it’s going to take the sector as a whole to believe that we’re going to have to build AI into our student assessments,” he adds.

Leave A Reply

Your email address will not be published.