Abstract
Massive pretrained language models have garnered attention and controversy due to their ability to generate humanlike responses: Attention due to their frequent indistinguishability from human-generated phraseology and narratives and controversy due to the fact that their convincingly presented arguments and facts are frequently simply false. Just how humanlike are these responses when it comes to dialogues about physics, in particular about the standard content of introductory physics courses? This case study explores that question by having ChatGPT, the preeminent language model in 2023, work through representative assessment content of an actual calculus-based physics course and grading the responses in the same way human responses would be graded. As it turns out, ChatGPT would narrowly pass this course while exhibiting many of the preconceptions and errors of a beginning learner. A discussion of possible consequences for teaching, testing, and physics education research is provided as a possible starter for more detailed studies and curricular efforts in the future.
5 More- Received 2 February 2023
- Accepted 21 March 2023
DOI:https://doi.org/10.1103/PhysRevPhysEducRes.19.010132
Published by the American Physical Society under the terms of the Creative Commons Attribution 4.0 International license. Further distribution of this work must maintain attribution to the author(s) and the published article’s title, journal citation, and DOI.
Published by the American Physical Society