The recent debut of the artificial intelligence chatbot, ChatGPT, has generated significant excitement due to its ability to generate human-like text and engage in conversations. However, researchers have discovered several telltale signs that can help differentiate AI chatbots from humans. A study published in the journal Cell Reports Physical Science on June 7 details these signs and introduces a tool capable of identifying AI-generated academic science writing with over 99% accuracy.
The study’s first author, Heather Desaire, a professor at the University of Kansas, emphasized the goal of creating an accessible method that even high school students could use to develop AI detectors for various writing types. Desaire noted the need to address the issues associated with AI writing, as one of the significant problems is the lack of accuracy checks when assembling text from multiple sources, resembling the game “Two Truths and a Lie.”
Although several AI text detectors are available online and perform reasonably well, they are not specifically designed for academic writing. To address this gap, the research team focused on building a tool tailored for academic science writing, particularly for a type of article called perspectives. Perspectives provide an overview of specific research topics and are written by scientists. The team selected 64 perspectives and created 128 ChatGPT-generated articles on the same research topics to train the model. During the comparison of these articles, they discovered an indicator of AI writing—predictability.
In contrast to AI, human writing exhibits more complex paragraph structures, including variations in the number of sentences, total words per paragraph, and sentence length fluctuations. Differences in punctuation usage and vocabulary preferences also serve as giveaways. For instance, scientists tend to use words like “however,” “but,” and “although,” whereas ChatGPT often employs terms like “others” and “researchers” in its writing. The research team identified 20 characteristics for the model to consider.
When tested, the model successfully identified AI-generated full perspective articles with 100% accuracy and achieved an accuracy rate of 92% for identifying individual paragraphs within the articles. Additionally, the team’s model significantly outperformed an existing AI text detector in similar tests.
Moving forward, the team plans to explore the model’s applicability by testing it on larger datasets and across different types of academic science writing. They are particularly interested in assessing whether the model can effectively detect more advanced and sophisticated AI chatbots as they continue to evolve.
Although many people are curious about whether the research can be used to determine if students have written their own papers, Desaire clarified that the model was not designed for catching AI-generated student essays. However, she emphasized that individuals can replicate the team’s methods to build models suited for their specific purposes.
Source: Cell Press