Can an AI-tool grade assignments in an introductory physics course?

Problem solving is an integral part of any physics curriculum, and most physics instructors would likely agree that the associated learner competencies are best assessed by considering the solution path: not only the final solution matters, but also how the learner arrived there. Unfortunately, providing meaningful feedback on written derivations is much more labor and resource intensive than only grading the outcome: currently, the latter can be done by computer, while the former involves handwritten solutions that need to be graded by humans. This exploratory study proposes an AI-assisted workflow for grading written physics-problem solutions, and it evaluates the viability of the actual grading step using GPT-4. It is found that the AI-tool is capable of providing feedback that can be helpful in formative assessment scenarios, but that for summative scenarios, particularly those that are high-stakes, it should only be used for an initial round of grading that sorts and flags solution approaches.


I. INTRODUCTION A. Generative Pre-trained Transformer
In fall 2022, Generative Pre-trained Transformer (GPT) [1] rapidly gained the World stage as a publicly available AI-tool with surprising capabilities.Chat-GPT is a text-based interface to this underlying Large-Language Model (LLM).GPT is essentially a tool that produces plausible fiction using a neural-network-based autocomplete algorithm; in that respect, it is similar to the autocomplete on smartphones that suggests likely next words when the user is composing text messages.However, GPT was trained based on a massive text corpus gleaned from public and proprietary sources, and it was extensively fine-tuned by humans.Also, GPT does not work with words, but with finer-grained tokens, which are similar but not identical to syllables in a word.
ChatGPT allows for text-based dialogues: prompts by the user, responses by GPT.These dialogues are remarkably human-like, and the system would likely pass the Turing test for many situations [2].While being used, apart from continued human training by the company behind GPT, OpenAI, the system does not learn anymore as a whole.However, it learns within the confines of a particular dialogue, so it can refer to statements made earlier in the same dialogue.At some point, though, it hits its internal so-called "token limit" -it can only keep a limited number of tokens in memory, comparable to the working memory limitations of a human.Dialogues thus cannot become too extensive before either an error message occurs or the system simply appears to forget what was stated earlier in the dialogue (which will turn out to be a limitation for this study).
The massive training effort behind GPT resulted in the emergence of capabilities that are not necessarily expected from a language model.GPT-4 has been found to pass several standardized exams in the upper percentiles [3][4][5].A previous version, GPT-3, could already pass an introductory physics course at a nominal level [6], and there are indications that GPT-4 performs even better on physics concepts [7].
While ChatGPT is limited to text-based input and output, GPT-4 itself is multimodal, so it can accept image input [5].Among other examples, in advertising videos, the company demos how handwriting and hand-written mathematical formulas are turned into machine-readable documents.Using this feature, however, requires access to its Application Programming Interface (API), which is unfortunately restricted by OpenAI -as of April 2023, there is a long waiting list, and thus this functionality could not be tested in this study.

B. Solving Physics Problems
The idea of grading problem solutions using AI has been around for a while [8], but was oftentimes hampered by the need to train the AI for specific problems; in other words, the AI learned from humans how to grade a specific problem, and in the end it mimics their scoring of that problem.GPT and similar "pre-trained" systems promise a more universal solution, which is able to ad hoc grade problems it has not encountered before.The fulfillment of this promise seems within reach, since these same "pre-trained" systems are also able to solve a wide variety of problems they have not encountered before [6].
Strategically solving physics problems requires logical, conceptual, and mathematical competencies [9,10], and hardly a topic in Physics Education Research has been investigated more extensively [11].While the final solution to physics problems, as well as pre-determined, scaffolded steps along the way, can be assessed to varying degrees using computer systems [12][13][14][15][16][17][18][19], a deeper analysis of the complex problem-solving competencies of learners requires an evaluation of the solution path and derivation [20,21].However, grading written solutions currently requires human effort, and the effect of this more meaningful feedback on final learning outcomes may be curbed by having fewer such opportunities due to limited human-grading resources and the time delay before such feedback is returned [22].The purpose of this study is to explore if meaningful feedback can be received from written homework and exam problems while in addition having the benefits of more frequent formative assessment with immediate feedback traditionally afforded by solution-focussed online systems.

II. SCENARIO
The proposed scenario has multiple steps, illustrated in Fig. 1.In one possible setting, learners are writing problem solutions on paper; while this might seem quaint, most anybody would agree that typing mathematical formulas is cumbersome, and assembling them in formula editors like the ones built into Microsoft Word is timeconsuming and tends to be frustrating.The established standard for typing mathematics is LaTeX, and while some experts can "think" in LaTeX, paper and pencil are far more efficient and intuitive (there are even arguments that this materiality is essential [23]).This scenario is particular well-suited for exams, which can then take place in traditional, supervised, "offline" modes.
1.The first step is simply scanning the assignments papers into a PDF or image format; this could be accomplished with most copy machines using automatic paper feeding of the student worksheets, where the real challenge are likely paper jams from crumbled up exam sheets.
2. The PDF is then processed by AI-based optical character recognition and translated into a machine-readable format, for example LaTeX.In this study, due to lack of access to the appropriate API, these steps are simulated; the reliability of this step would be the subject of a separate, future study.
3. The LaTeX documents is then graded by an AItool, in this study GPT-4.This step might entail several, independent rounds of grading, which are subsequently summarized.Variations include flagging solutions where grading results diverge for subsequent human arbitration, or categorizing the solutions into similarity classes as a preliminary step to more rapid human grading.
A possible second scenario for the first step could be using pen-computing, where learners are directly drawing on a screen.Some modern tablets and laptops are mimicking the paper feel with the appropriate friction, and students appear to be comfortable with this technology, seeing how many of them are using this for lecture notes.A third scenario for the first step would be to photograph or scan the solutions with a smartphone and uploading them to an online system.These latter two scenarios are less fit for exam settings, as being online means having access to all kinds of resources, communication channels, and online tools (including AI-tools!).These additional affordances of being online would either need to be incorporated into the tasks themselves (possibly making them more demanding) or attempted to be blocked by lock-down technologies [24].
The AI-component of a system like this is very unlikely installable on-premise, and likely only commercial cloudbased solutions are viable.This will raise privacy and data-security concerns that need to be addressed, particularly when dealing with mandatory assignments [25].

III. METHODOLOGY A. Generation of Sample Solutions
As an example for this study, a simple time-dependent RC-circuit problem was chosen.This problem, shown as the prompt in Fig. 2, involves some conceptual, strategic, and mathematical challenges, but is likely in one form or the other a part of most calculus-based introductory physics courses.The initial potential difference across the capacitor is given, even though it is not needed; this superfluous information makes it tempting to immediately calculate the initial current through the capacitor, leading to a less than straightforward solution, since that current later drops out.Particularly when not working symbolically, but immediately "plugging and chugging," this involves unnecessary steps [26].
Since GPT uses a probabilistic algorithm, presenting the same prompt twice will lead to different responses.This property was used to generate a set of 25 unique sample solutions for this study.GPT-4 has better reasoning capabilities than its predecessors, so it more likely produces correct solutions.To also have plausible, but incorrect solutions in the sample, the majority of the solutions were generated by earlier releases.
As output format, LaTeX was chosen, since it would be a likely output of the AI-based optical character recognition (see Fig. 1).During the output phase, ChatGPT converts display equations into a symbol font, so upon copying these expressions into a text file, the LaTeX source code would be lost; thus inline-expressions were requested.Workarounds like these would not be necessary when accessing GPT directly through an API.
Arguably, these samples are setting up GPT to grade itself.While being somewhat verbose and adopting the Royal 'We' of instructors and textbooks, for the purposes of this exploratory study, the solutions seem representa-

T h e ti m e co n s ta n t is g iv en b y 𝜏 = R C
The time constant is given by $\tau=RC$
To find the time t2 at which half the initial current flows through the resistor, we need to analyze an RC circuit.
In an RC circuit, the voltage across the capacitor as a function of time is given by: VC (t) = V0e −t/RC , where V0 is the initial voltage across the capacitor (5V), R is the resistance (10 Ohms), C is the capacitance (100mF), and t is the time.The current flowing through the resistor as a function of time is given by: I(t) = V 0 R e −t/RC .Now, we need to find the time t2 when the current is half of the initial current, so: 1 2 I(0) = I(t2), Substituting I(0) and I(t2) with the given expressions, we get: R e −t 2 /RC .Dividing both sides by (V0/R), we get: Taking the natural logarithm of both sides, we get: RC .Now, we can solve for t2: t2 = RC ln 2. Substituting the given values of R and C: t2 = (10, Ohms)(100 × 10 −3 , F) ln 2. t2 = 1, s ln 2. Therefore, the time t2 when half the initial current flows through the resistor is approximately 0.693, s.

FIG. 2. Prompt and Solution 1 for this study (generated by GPT-4).
tive enough of what students might submit.This spans the gamut from solutions 1 and 12, which are almost perfect, to solutions 9 and 19, which are completely missing the mark, and it includes the expected unnecessary calculations and transfer of numerical values from one formula to the next.As an aside, while generated to simulate human problem solving, this sample set illustrates the pro-gression in reasoning capabilities between Legacy GPT-3.5 and GPT-4, achieved in just a few months.These solutions now provide the base for the actual grading study (last step in Fig. 1).

B. Grading of Solutions
GPT-4 solved the problem correctly in all instances, so it seems appropriate to use it for grading.The solutions were scored on a scale of 0 (worst) to 4 (best) on a rubric of correctness of approach, correctness of symbolic derivations, correctness of the numerical result, and straightforwardness.In addition, for each solution, a one-sentence feedback was requested.The rubric scores were combined to a total score with a stronger weight on the final, numerical result.Finally, the system was prompted to generate a correlation table between the solutions based on similarity-of-approach; Fig. 8 shows an example of the associated prompt.
Since GPT is probabilistic, it will not arrive at one deterministic score combination for each solution.Similar to having more than one human grader look at the same solution, each solution was scored several times.Due to the limitations of GPT's token limit, not all 25 solutions could be fed into the system at the same time.They were thus processed in 75 dialogues of randomly compiled batches of 5 solutions, leading to an average of 15 grading rounds for each solution.Unfortunately, more often than not, ChatGPT did not provide the tables in the requested CSV-format, which necessitated some subsequent manual reformatting of the responses in a text editor and Excel.The narrative one-sentence feedback messages from the on the average 15 "graders" were summarized using GPT-4 into one longer statement reflecting the majority, using the prompt shown in Fig. 9.
The same solutions were independently graded by the author on the same rubric of correctness of argument, symbolic derivations, and numerical results, as well as straightforwardness.Of course, human grading decisions are also somewhat arbitrary, but this was not further considered in this study.The reader can, however, form his or her own judgement for each solution, which can be found in Figs.2-7.
Solution 2: We can use the RC circuit equations to find the time when the current is half the initial value.The current in the circuit can be described by the equation: Where I(t) is the current at time t, I0 is the initial current, and RC is the time constant of the circuit.The initial current, I0, can be found using Ohm's law: We're looking for the time t2 when the current is half the initial current: 2 I0 Now we substitute the values for R, V , and C: I0 = 5V 10Ω = 0.5A Plugging in the value of I0 into the equation for I(t2): 0.25A = 1 2 (0.5A)e −t 2 /(10Ω•100×10 −3 F ) We want to find t2.First, we can simplify the equation by dividing both sides by 0.5A: 0.5 = e −t 2 /(10Ω•100×10 −3 F ) Now, take the natural logarithm of both sides: ln(0.5)= −t2/(10Ω • 100 × 10 −3 F ) Then, multiply both sides by the time constant, RC: t2 = − ln(0.5)(10Ω• 100 × 10 −3 F ) And finally, calculate the value of t2: t2 ≈ 0.693 • (10Ω • 100 × 10 −3 F ) ≈ 0.693 • 1s ≈ 0.693s So, at t2 ≈ 0.693 seconds, half the initial current flows through the resistor.
Solution 3: To solve this problem, we will first find the equation for the current in the circuit as a function of time, and then we will find the time when the current is half of its initial value.The current flowing through the resistor at any given time t can be determined using the capacitor discharge equation: Where I(t) is the current at time t, I0 is the initial current, R is the resistance, C is the capacitance, and t is the time elapsed.We need to find I0 first, which is the initial current flowing through the resistor at t = 0. We can find it using Ohm's law: Given values are R = 10, Ohms and V = 5, V. I0 = 5,V 10,Ohms = 0.5, A Now we have the equation for the current: I(t) = 0.5e −t/RC We want to find the time t2 when the current is half of its initial value: Divide both sides by 0.5: 0.5 = e −t 2 /RC Now take the natural logarithm of both sides: ln 0.5 = − t 2
Solution 4: To find the time when half the initial current flows through the resistor, we first need to find the initial current and the equation for the current through the resistor as a function of time.The initial current (I0) can be found using Ohm's Law: where V is the initial voltage across the capacitor (5V) and R is the resistance (10 Ohms).I0 = 5V 10 Ohms = 0.5A The current through the resistor as a function of time can be found using the formula for the current in an RC circuit: where I(t) is the current at time t, V is the initial voltage across the capacitor, R is the resistance, C is the capacitance, and t is the time.In our case, V = 5V, R = 10 Ohms, and C = 100mF = 0.1F.Now we need to find the time t2 when half the initial current flows through the resistor, i.e., when I(t2) = 0.5I0 = 0.25A.We can set up the equation: Now we need to solve for t2.First, we can simplify the equation: 0.5 = e − t 2 1s Next, take the natural logarithm of both sides: ln(0.5)= − t 2 1s Finally, solve for t2: t2 = −1s • ln(0.5)≈ 0.6931s So, at t2 ≈ 0.6931s, half the initial current flows through the resistor.
Solution 5: To solve this problem, we'll first find the initial current and then determine the time it takes for the current to decrease to half that value.The initial current (I0) flowing through the resistor can be found using Ohm's law: The current in the circuit at any time t can be expressed as: We want to find the time t2 when the current is half the initial current: Divide both sides by I0: Now, take the natural logarithm of both sides: So, at t2 ≈ 0.693, s, half the initial current flows through the resistor.
FIG. 3. Solutions 2 and 5 to the prompt in Fig. 2.

Solution 6:
The current in the circuit at time t is given by the equation: , where RC is the time constant of the circuit.At time t2, half of the initial current flows through the resistor, so:

RC
Taking the natural logarithm of both sides, we get: Substituting the values given in the problem, we get: t2 = −10 • 0.1 • ln 1 2 ≈ 0.69 seconds Therefore, half the initial current flows through the resistor at t2 ≈ 0.69 seconds.Solution 7: The current through a capacitor is given by I(t) = C dV dt , and the voltage across a capacitor is given by V (t) = V0e − t RC where V0 is the initial voltage across the capacitor, R is the resistance of the circuit, and C is the capacitance of the capacitor.At t = 0, the voltage across the capacitor is V0 = 5 V, so we have: 5 A where the negative sign indicates that the current is flowing in the opposite direction to the assumed direction.To find the time t2 at which half the initial current flows through the resistor, we need to solve the equation: I(t2) = − 0.5 2 = −0.25A We can use the equation for the voltage across the capacitor to solve for t2: V (t2) = V0e − t 2 RC = 2.5 V Taking the natural logarithm of both sides and rearranging, we get: ln V 0 2.5

RC
Substituting in the values of V0, C, and R, we get: Simplifying, we get: t2 = 10 × 10 −3 ln(2) ≈ 6.93 ms Therefore, half the initial current flows through the resistor at a time of t2 ≈ 6.93 ms.Solution 8: The current through the circuit is given by the equation: where i(t) is the current at time t, V is the initial voltage across the capacitor, R is the resistance, C is the capacitance, and e is the base of the natural logarithm.
To find the time t2 when half the initial current flows through the resistor, we need to solve for t in the equation: Substituting the equation for i(t) and simplifying, we get: Taking the natural logarithm of both sides, we get: Solving for t2, we get: t2 = −RC ln 1 2 = RC ln(2) ≈ 0.69 s Therefore, half the initial current flows through the resistor at approximately t2 = RC ln(2).

Solution 9:
The current through a capacitor in a circuit can be calculated as i(t) = C dv (t)  dt , where v(t) is the voltage across the capacitor at time t.In this case, the voltage across the capacitor is given as V = 5V , and the capacitance is C = 100mF = 0.1F .Therefore, the initial current flowing through the resistor is: i0 = C dV dt = 0 This is because at time t = 0, the capacitor is fully charged, so there is no current flowing through the circuit.The current at any later time t can be calculated using: The voltage across the capacitor at any time t can be found using: where RC is the time constant of the circuit, which is equal to the product of resistance R and capacitance C. In this case, RC = 1s.Therefore, the current at time t is: To find the time t2 at which half the initial current flows through the resistor, we can set i(t2) = i 0 2 and solve for t2: RC = 0 This equation has no solution, which means that half the initial current never flows through the resistor.The reason for this is that a capacitor does not allow sudden changes in voltage or current, but instead resists them by storing or releasing charge.When the circuit is first connected, the capacitor is fully charged and there is no current flowing.As the voltage across the capacitor decreases, the current flowing through the resistor increases, but it never reaches half the initial current.The current continues to decrease exponentially over time, approaching zero as the voltage across the capacitor approaches zero.
Solution 10: The current flowing through the resistor and capacitor is given by the equation: where RC is the time constant of the circuit, which is equal to RC = 10 −1 seconds in this case.The initial current i(0) can be found by plugging in t = 0: i(0) = V R = 0.5 A To find the time t2 when half the initial current is flowing through the resistor, we need to solve for t in the equation: Substituting in the expression for i(t) and solving for t, we get: Plugging in the values for R, C, and V , we get: t2 = −10 −1 ln 0.25×10 ≈ 0.138 seconds Therefore, half the initial current flows through the resistor at time t2 ≈ 0.138 seconds.FIG. 4. Solutions 6 through 10 to the prompt in Fig. 2.
Solution 11: The current through a capacitor is given by: i(t) = C dv(t) dt where i(t) is the current at time t and v(t) is the voltage across the capacitor at time t.The voltage across the capacitor decreases exponentially over time according to: where V0 is the initial voltage across the capacitor, R is the resistance of the resistor, C is the capacitance of the capacitor, and t is the time elapsed since the capacitor was fully charged.The current flowing through the resistor is given by Ohm's Law:

R
Substituting the equation for v(t) into this equation yields: To find the time t2 when half of the initial current flows through the resistor, we need to solve the following equation for t: iR(t2) = 1 2 iR(0) Substituting the expressions for iR(t2) and iR(0), we get: Simplifying this equation yields: Taking the natural logarithm of both sides, we get: Solving for t2, we get: Substituting the values for R and C, we get: Calculating this expression yields: t2 ≈ 0.69 s Therefore, half of the initial current flows through the resistor at approximately t2 = 0.69 s.Solution 12: The current flowing through the resistor and capacitor at time t is given by the equation: where RC is the time constant of the circuit.At time t = 0, the current is: i(0) = V R = 5 10 = 0.5 A To find the time t2 when half the initial current flows through the resistor, we need to solve the equation: i(t2) = 1  2 i(0) Substituting the expression for i(t) and simplifying, we get: Taking the natural logarithm of both sides, we get: Substituting the values of R and C, we get: t2 = (10 Ω)(100 µF) ln 2 = 0.693 s Therefore, half the initial current flows through the resistor after t2 = 0.693 s.
Solution 13: The current flowing through the circuit can be described by the equation: where I is the current, V is the voltage, R is the resistance, C is the capacitance, and t is time.
To find the time t2 when half the initial current flows through the resistor, we need to set I equal to half of its initial value and solve for t: Canceling out the terms, we get: Taking the natural logarithm of both sides, we get: Solving for t2, we get: Plugging in the values for R and C, we get: t2 = −10 Ohms • 100 mF • ln 1 2 t2 ≈ 0.69 s Therefore, half the initial current flows through the resistor after approximately 0.69 s.

Solution 14:
The voltage across the capacitor as a function of time is given by: Where V0 is the initial voltage and RC is the time constant of the circuit.The time constant is defined as RC = R C .The current through the resistor as a function of time is given by: RC e −t/RC At time t = 0, the current is at its maximum value, which is given by:

RC
Half the initial current is equal to 1  2 I0.The time t2 at which this occurs can be found by setting I(t2) = 1 2 I0 and solving for t2: Substituting the values for R and C: t2 = −10 Ohms • 100 mF ln 1 2 = −10 • 10 −3 ln 1 2 ≈ 6.93 s So the time t2 at which half the initial current flows through the resistor is approximately 6.93 seconds.FIG. 5. Solutions 11 through 14 to the prompt in Fig. 2.
Several measures of similarity between solutions were considered as basis for clustering: similarity between the solution scores on the rubric for both human and AI grades, as well as the similarity-of-approach provided in the response of the AI-tool; since due to the token limit, only five solutions could be graded at a time, the overall 25 × 25-matrix was compiled by averaging the 5 × 5matrices.Clustering was accomplished using the gplotspackage [28] within R [29].
Solution 15: The voltage across the capacitor VC and the current through the resistor I are related by the equation: VC = IR Since the voltage across the capacitor and the capacitance are constant, the current through the resistor can be expressed as: 5A where Q is the charge on the capacitor.The current through the resistor decreases as the voltage across the capacitor decreases, so at time t2, the current through the resistor will be half of its initial value: I(t2) = 0.5 • 0.5A = 0.25A The voltage across the capacitor can be expressed as: Setting I(t2) = 0.25A and solving for t gives: So the time t2 at which half the initial current flows through the resistor is approximately 69.3ms.

Solution 16:
The voltage across the capacitor and the resistor can be expressed as: where V0 is the initial voltage across the capacitor.The current flowing through the resistor can be expressed as: The time t2 at which half the initial current flows through the resistor can be found by setting I(t2) = I 0 2 and solving for t2: The initial current i(0) through the resistor at time t = 0 can be calculated using Ohm's law: i(0) = V R = 5V 10Ω = 0.5A Let t2 be the time when half the initial current, i(0)/2, is flowing through the resistor.The current through a resistor in an RC circuit is given by: i Setting i(t2) = i(0)/2 and solving for t2: Substituting the values of R, C, and i(0 43ms So, half the initial current is flowing through the resistor after approximately 11.43 milliseconds.

Solution 18:
The initial current I(0) through the resistor at time t = 0 can be calculated using Ohm's Law: I(0) = V R = 5 10 = 0.5 A The current through the resistor at any time t can be calculated using the exponential decay equation for an RC circuit: Where RC is the time constant of the circuit, which is equal to RC = R * C = 10 * 10 −5 = 10 −4 seconds.To find the time t2 when half the initial current is flowing through the resistor, we can set I(t2) = I(0) 2 and solve for t2: 2 ) Using a calculator, we can find that t2 ≈ 0.693/10 −4 ≈ 693 ms.So, half the initial current flows through the resistor after approximately 693 milliseconds.

Solution 19:
The current through the resistor at any time t can be given by the equation I(t) = C dV dt .At time t = 0, the voltage across the capacitor is 5V and the current is given by I(0) = C dV dt = 5 10 −6 = 5 • 10 6 Amps.The half-life of the current can be calculated as the time at which the current through the resistor is half of its initial value, or I(t2) = 1  2 I(0).Using Ohm's law V = IR, the voltage across the resistor at time t2 can be calculated as Finally, the time t2 can be calculated using the equation for the current, I(t) = C dV dt , and the voltage across the capacitor, V (t) = V0e − t RC , as follows: FIG. 6. Solutions 15 through 20 to the prompt in Fig. 2.

Solution 21:
The initial current through the resistor is given by the equation i(t) = C dv dt , where v(t) is the voltage across the capacitor.When the capacitor is fully charged, the voltage across it is 5V.At time t = 0, the voltage across the capacitor begins to decrease as charge flows through the resistor, so dv dt < 0.
The time t2 at which half the initial current is flowing through the resistor can be found by solving for i(t) = 1 2 i(0) and using the above equation for i(t).i Solution 22: The time constant τ of an RC circuit is given by τ = RC.In this case, τ = RC = (10 Ohms)(100 mF) = 1 s.The voltage across the capacitor as a function of time can be described by the equation VC (t) = V0e −t/τ , where V0 is the initial voltage across the capacitor.The current through the resistor is given by I(t) = C dV C dt .Substituting the expression for VC (t), we get

Rearranging and solving for t gives us
Half the initial current is I(t) = 1  2 I(0) = 250 mA.Setting this equal to the expression for I(t) and solving for t, we get: 1  2 =⇒ t2 = τ ln 2 = 1 s ln 2 ≈ 0.693 s So, half the initial current flows through the resistor at approximately t2 = 0.693 s Solution 23: The current i in an RC circuit at any time t can be described by the equation i(t) = C dV dt , where V is the voltage across the capacitor.The voltage across the capacitor can be described by the equation V (t) = V0e − t RC , where V0 is the initial voltage.Substituting the second equation into the first equation, we get: We want to find the time t2 such that i(t2) = 1 2 i(0), where i(0) = V 0 R .Substituting the values for V0, R, and C into this equation and solving for t2 gives us: t2 = RC ln 1 10 = 10s ln 1 10 ≈ 16.4sSo the time t2 at which half the initial current flows through the resistor is approximately 16.4 seconds.

Solution 24:
The voltage across the capacitor and the resistor is the same at any given time.We can use the voltage across the capacitor to determine the current flowing through the resistor.The equation for the voltage across a capacitor is given by: where V0 is the initial voltage and t is the time elapsed.The current flowing through the resistor is given by: We want to find the time t2 at which the current is half of the initial current.Thus, Solving for t2, we get: t2 = RC ln 2 e ≈ 6.93RC Plugging in the values for R, C, and V0, we get: t2 ≈ 6.93 • 10 • 0.0001 = 0.00693 seconds.

Solution 25:
The voltage across the capacitor as a function of time t is given by V (t) = V0e −t/RC , where V0 is the initial voltage and RC is the time constant.The current flowing through the resistor as a function of time t is given by I(t) = C dV dt = − V 0 RC e −t/RC .We want to find the time t2 at which the current is half the initial current, I(t2) = 1 2 I0, where I0 = C dV dt t=0 = V 0 R .Solving for t2, we get: Prompt: Consider the following numbered solutions to this problem: "At $t=0$, a resistor (resistance $R$=10 Ohms) is connected across a fully charged capacitor (voltage $V$=5V, capacitance $C$=100mF).At what time $t 2$ does half the initial current flow through the resistor?"Grade them from 0 (worst) to 4 (best) in terms of each correctness of argumentation, symbolic correctness, numerical correctness, and straightforwardness.Provide these grades in a comma-separated table with an additional column that gives a one-sentence feedback.Also, provide a commaseparated matrix of similarity between the solutions based on approaches, with the solution number as header row.

FIG. 8. Example of a grading prompt.
Prompt: The following are feedbacks on solutions to problem 1 on a test, prepared by different graders.Summarize these statements into one feedback reflecting the majority of the graders.
Tables I, II and III show the rubric-grading results by GPT-4, based on an average of 15 rounds of grading within different batches of five problems each.
Striking are the scores that have more than one point of standard deviations to their averages, which are: • The correctness of the argument in Solutions 10, 15, 21, 23, 24, and 25: All of these solutions in the end arrive at the wrong numerical answer.Solutions 10 and 15 also needlessly calculate a numerical value for the initial current, while Solutions 21, 23, and 25 take a slight detour via the capacitative current I(t) = C • dV /dt.Solution 24 starts with the expression for a charging instead of a discharging capacitor, which would still have arrived at the same answer had it not been for numerical errors.
• The correctness of the symbolic calculations in Solutions 15 and 23: Solution 15 is plug-and-chug, i.e., it plugs numbers into the first available formula and then transfers the numerical result to the next formula.Thus, the score for symbolic correctness of Solution 15 should have been consistently low for all grading rounds.Solution 23 makes an error in the very first line of the symbolic calculations by not eliminating the capacitance C.
• The correctness of the numerical results, in particular for Solutions 2, 3, 4, 6, 7, etc. -18 out of the 25 problems altogether: 8 of these solutions actually have the correct numerical answer, 4 are wrong by one or more orders of magnitude, and 6 of the problems have a completely different numerical answers due to various reasons.The large list may be the result of GPT 4 still falling short on numerical calculations.
• The straightforwardness of Solutions 15, 21, 23, and 25: All of these solutions already appeared in other lists of ambiguous scores.
Overall, there does not appear to be a discernible pattern as to why these solutions are leading to widely spread scores.The one-sentence feedback given by GPT-4 also does not provide helpful hints for determining which features of these solutions the algorithm may have latched on to.
The summaries of the one-sentence feedback message sound very plausible, but many of them include incorrect or at least misleading statements.Examples of incorrect statements include: • "The majority opinion indicates that there is an error in the calculation of the time constant, leading to an incorrect numerical result for t 2 " for Solution 2, when in fact the numerical result is correct.
• "The majority of graders agree that Solution 24 uses the correct approach, equations, and symbolic representation," when in fact the equation for a charging capacitor is used.
Examples of misleading statements are: • "One grader notes that the final answer should be in milliseconds, not seconds" for Solution 1; while it is true that microseconds would have been more elegant, this is not required.
• "Solution 17 demonstrates the correct approach," when it fact it includes unnecessary calculations.4.0 ± 0.0 4.0 ± 0.0 3.9 ± 0.3 4.0 ± 0.0 The majority of graders agree that Solution 1 is correct, straightforward, and well-structured.The solution uses the correct formula for the current in an RC circuit, derives the expression for the time t 2 , and solves for the required time when half the initial current flows through the resistor.One grader notes that the final answer should be in milliseconds, not seconds.Overall, the solution is clear, concise, and logically explained.3.6 ± 0.9 3.7 ± 0.6 2.8 ± 1.6 3.8 ± 0.4 The majority of graders agree that the solution to problem 2 has a correct and clear approach, using the appropriate RC circuit equations and argumentation.However, there seems to be some disagreement on the numerical calculation for t 2 .The majority opinion indicates that there is an error in the calculation of the time constant, leading to an incorrect numerical result for t 2 .Overall, the solution is well-organized and logically explained, but the numerical value should be revised.
3 3.7 ± 0.5 3.7 ± 0.5 2.2 ± 1.6 3.7 ± 0.5 The majority of graders agree that Solution 3 has a correct and straightforward approach, with correct argumentation and symbolic representation.However, there is a common issue with the numerical calculation for t 2 , with some graders indicating the result is off by a factor of 1000 or in the wrong unit (milliseconds instead of seconds).Despite these errors, the solution is generally well-explained and follows a logical approach.4 3.7 ± 0.5 3.8 ± 0.4 3.3 ± 1.2 3.8 ± 0.4 Solution 4 demonstrates a correct and straightforward approach to solving problem 4, using appropriate RC circuit equations and accurately deriving the time t 2 .However, the majority of the graders point out a minor numerical error in the final answer, possibly due to rounding or unit conversion.Despite this, the solution is clear, well-argued, and symbolically correct.3.9 ± 0.4 3.9 ± 0.4 3.9 ± 0.4 3.9 ± 0.4 Solution 5 is generally correct and straightforward, successfully applying the exponential decay equation for the RC circuit to derive the result for t 2 .The majority of the graders appreciate the clear argumentation, symbolic correctness, and numerical accuracy.However, there is a minor numerical error due to rounding, and some graders suggest that the clarity and straightforwardness of the argument could be improved.Overall, the solution demonstrates a good understanding of Ohm's law, the time constant, and the current equation for an RC circuit.The majority of the graders agree that the solution to problem 6 is correct, wellargued, and straightforward in its approach, with accurate symbolic representation and explanation.However, some graders noted an incorrect numerical value in the final result or a mistake in calculating t 2 .

7
3.1 ± 0.9 3.6 ± 0.6 3.1 ± 1.1 3.1 ± 0.9 The majority of the graders found Solution 7 to be correct with accurate argumentation, symbolic representation, and numerical values.However, some graders mentioned issues such as incorrect current direction, voltage calculation, and final result, as well as an incorrect manipulation of the equation.Despite these discrepancies, most graders agreed that the solution was straightforward and used the correct approach.representation, and a straightforward explanation.The solution correctly derives the formula for the time at which half the initial current flows and uses the appropriate approach.However, there is disagreement on the numerical value, with some graders noting that it is incorrect or off by a factor due to not considering units.Overall, the solution is mostly correct, but the final numerical answer may need to be revised.9 0.2 ± 0.4 1.0 ± 0.9 0.0 ± 0.0 0.8 ± 0.7 Solution 9 demonstrates an incorrect understanding of the initial current, mistakenly assuming it to be zero.This leads to incorrect calculations, argumentation, and the false conclusion that half the initial current never flows through the resistor.The overall approach and reasoning are flawed, resulting in an incorrect solution.
10 2.8 ± 1.1 3.1 ± 0.8 1.3 ± 1.3 2.9 ± 0.9 The majority of the graders agree that the solution to problem 10 has correct argumentation and symbolic representation, with a clear and straightforward approach.However, there are errors in the numerical calculations, including the time constant and natural logarithm term, leading to an incorrect final result.Some graders also mentioned unnecessary complexity and incorrect derivations.3.5 ± 0.8 3.5 ± 0.7 2.8 ± 1.4 3.5 ± 0.8 The majority of the graders agree that Solution 11 has a correct approach by using the exponential decay equation for the RC circuit and deriving the equation for i(t).However, there seems to be a consensus that there are errors in the calculations, leading to an incorrect numerical result for t 2 .The argumentation, symbolic correctness, and straightforwardness could be improved.Despite these errors, some graders still consider the solution to be well-structured and clear.
12 4.0 ± 0.0 4.0 ± 0.0 4.0 ± 0.0 4.0 ± 0.0 Solution 12 is a correct, well-structured, and straightforward approach to the problem.The majority of graders praised the clear argumentation, appropriate use of symbols and equations, and accurate numerical results.The solution effectively uses the RC circuit formula and provides a concise explanation.13 3.9 ± 0.3 3.9 ± 0.3 3.5 ± 1.0 3.9 ± 0.3 The majority of the graders agree that Solution 13 is correct, clear, and straightforward.The solution uses the appropriate RC circuit equations, symbols, and approach to find the time when half the initial current flows through the resistor.While there is mention of a sign error and incorrect final result by a couple of graders, the overall consensus supports the solution's correctness and organization.14 3.2 ± 0.9 3.4 ± 0.8 2.3 ± 1.4 3.2 ± 0.9 The majority of graders agree that Solution 14 has a correct approach and argumentation, with symbolic correctness and straightforwardness.However, there are inconsistencies in the numerical results, likely due to errors in calculations or sign errors.
The majority of graders agree that the solution to problem 15 is mostly accurate in terms of approach and equations used, but there are some minor mistakes in argumentation, symbolic correctness, and numerical calculations.The solution is considered to be somewhat straightforward and well-explained, but with a few errors in derivation and numerical results.16 3.3 ± 0.8 3.6 ± 0.5 1.9 ± 1.5 3.6 ± 0.5 The majority of the graders agree that Solution 16 has a correct approach and demonstrates accurate symbolic representation, argumentation, and use of formulas.However, a calculation mistake in determining the initial current and subsequent errors in numerical calculations lead to an incorrect numerical answer for t 2 .Despite these numerical issues, the solution is well-organized and clear.17 3.3 ± 1.0 3.8 ± 0.6 2.2 ± 1.4 3.6 ± 0.6 Solution 17 demonstrates the correct approach, argumentation, and symbolic representation, but makes an error in the numerical calculation, specifically a sign error in the time constant, which leads to an incorrect final result.Despite this, the solution is considered straightforward and directly addresses the problem.18 3.1 ± 0.8 3.4 ± 0.6 2.6 ± 1.2 3.2 ± 0.7 Solution 18 has the correct approach and argumentation, but the majority of graders point out an incorrect time constant calculation, which leads to an incorrect final numerical result.Despite this, the solution is well-structured and uses proper symbols and numerical values.
19 0.3 ± 0.6 0.9 ± 0.9 0.2 ± 0.4 0.5 ± 0.7 The majority of the graders agree that Solution 19 contains multiple errors, including incorrect initial current and voltage calculations, as well as an incorrect approach for finding the time t 2 when half of the initial current flows through the resistor.These errors lead to an incorrect final result.The solution also has issues with argumentation, symbolic representation, and straightforwardness.   3.7 ± 0.6 3.9 ± 0.4 3.4 ± 1.2 3.7 ± 0.6 The majority of graders agree that Solution 22 is correct and straightforward in its approach to finding the time when half the initial current flows through the resistor.The solution uses the time constant and the voltage across the capacitor as a function of time.However, a few graders noted some minor issues with the numerical calculation and representation of the time constant.Overall, the solution is well-explained and clear, with accurate derivation and symbolic correctness.2.9 ± 1.2 3.5 ± 0.7 1.7 ± 1.4 3.0 ± 0.9 The majority of graders agree that Solution 24 uses the correct approach, equations, and symbolic representation to find the time at which half the initial current flows through the resistor.However, there are numerical errors in the calculation, particularly regarding the time constant and the voltage across the capacitor.Additionally, some graders noted unnecessary complexity in the solution.Overall, the solution is partially correct but suffers from incorrect numerical values and minor inconsistencies.The majority of the graders agree that Solution 25 has an incorrect numerical result for t 2 due to inconsistencies, errors, and incorrect values in the derivation and equation.The approach is mostly correct and some steps are symbolically accurate, but there are sign errors and the method is not straightforward.
Often this includes remarks about numerical inaccuracies, even if the result is correct.However, particularly for the solution attempts that were completely incorrect, the feedback can be useful, for example, "Solution 9 demonstrates an incorrect understanding of the initial current, mistakenly assuming it to be zero."

B. Agreement with Manual Grading
Fig. 10 shows the correlation between the rubric scores resulting from manual and AI-grading.The scores are clearly positively correlated, with a particularly high R 2 for the correctness of the argument and the numerical answer.This finding is surprising, since the correctness of the argument is more subjective than for example the correctness of the symbolic operations, and the scoring of the correctness of the numerical answers is the one with the highest standard deviation between grading cycles.Symbolic correctness has the lowest R 2 and is generally rated a lot higher by the AI than by the author.
Fig. 11 shows the total scores on the problem, where the rubric items approach, symbolic correctness, and straightforwardness are weighted 20%, and the correct final numerical answer 40%, respectively.In addition, the correspondingly combined standard deviations for the AI-gradings are given by error bars, and the data  points are labeled by solution number.
The scores for the best solutions agree fairly well with the linear interpolation (R 2 ≈ 0.83), but fluctuations are higher in partial credit situations.On the average, GPT assigns total scores that are almost 0.9 points higher than those by the author

C. Clustering
Figure 12 shows dendrograms and heat maps of the similarities between the solutions based on manual grading, AI-grading, and AI-approach similarity, respectively.Based on these, three clusters emerge for the manual and the AI-grading.For the AI-similarity-of-approach measure (determined by GPT in response to the prompt in Fig. 8), either two clusters could be identified, or the dendrogram could be cut at a deeper level (indicated by dashed lines in Fig. 12), resulting in four clusters.However, the fourth cluster only contains Solution 23, so a decision to not treat it separately could be justified.
These clusters are shown in Table .IV; it turns out that they roughly align with the total points in Fig. 11, with the low, medium, and highly scored problems forming the clusters.
All similarity measures identify the almost entirely wrong solutions 9, 19, and 21 as members of the "low" cluster.The manual grading has Solution 23 in the middle cluster, as the only mistake is CV0 −RC = V0 RC , that is, an error in the calculation rather than a fundamental error; this solution was singled out in the clustering according to similarity-of-approach. Solution 25 not once but twice includes the same kind of error in calculating fractions.
Solutions 11 and 18 made it into the highest cluster for manual grading, but are not found in the corresponding cluster for AI-grading.Both solutions arrive at the correct result (even though Solution 18 is somewhat nonchalant with the units in the second-to-last expression).
Overall, clustering the solutions based on the similarity of the scores within the grading rubric provided similar information for manual and AI-grading, while clustering based on the similarity-of-approach is less informative.The automatically formed clusters based on AI could be used for sorting solutions in preparation for human grading.
V. DISCUSSION

A. Potential for Assisting in Grading
GPT shows great promise as a tool for grading written solutions to physics problems, as evidenced by the results in Fig. 11.The system performs best at the extreme ends of the grading spectrum: clearly correct and clearly incorrect solutions are generally reliably recognized, but overall, the system currently falls short of the stringency required for standalone, unsupervised exam grading.Also, the system generally assigns more points than the author would.
While there are proposals to mitigate these shortcomings by providing "high transparency" assessments with multiple attempts and regrade requests [30], students will likely only invoke these for false negatives, i.e., scores that are lower than what they expected.This could lead to grade inflation.Instead of waiting for student complaints, instructional staff will need to verify the scores before the exam is returned.
As typical for the tool, feedback sounds plausible, but might be misleading.In grading problems, GPT-4 still exhibits some of the same weaknesses with symbolic and numerical calculations that its predecessor GPT-3 did in solving problems [6].OpenAI promises the optional incorporation of Wolfram Alpha [31], which may solve this issue, but just like for the API, there is also a waiting list for the required plugin.

B. Potential for Assisting in Learning
If not used for grading exams, especially in connection with smartphone-based scanning, GPT or similar tools could be used for assisting students in learning.This is a far more classic domain for AI in physics education [32,33] In that context it is important if the system could pinpoint particular errors.At the moment, though, this is still somewhat doubtful.For example, when prompted to find the mistake in Solution 14, the system responds: The mistake in the solution is in the definition of the time constant.The correct definition is RC = R • C. The rest of the solution is correct.We need to correct the time constant and then find the correct value of t 2 .
The time constant is defined as RC = R • C. In this case, R = 10, Ohms and C = 100, mF = 100 • 10 −3 , F, so the time constant is: Now we can substitute the corrected time constant into the equation for t 2 : t 2 = −RC ln 1 2 = −1, s ln 1 2 ≈ 0.693, s So the correct time t 2 at which half the initial current flows through the resistor is approximately 0.693 seconds.
The error is not really in the "definition" of the time constant, but simply in the calculation of the powers of ten.
When prompted to pinpoint the error in Solution 23, the response is: The mistake in the solution is in the derivation of the equation i(t 2 ) = 1  2 i(0).The correct equation should be: However, the solution incorrectly equates this expression to the equation for i(t):

RC
The correct equation for i(t) should be: Now, we can set up the correct equation for i(t 2 ):

RC
Now, we solve for t 2 : e − t 2 RC = 1 2 − t2 RC = ln 1 2 t 2 = RC ln 2 = 10s ln 2 ≈ 6.93s So the time t 2 at which half the initial current flows through the resistor is approximately 6.93 seconds.
The system correctly found that the C was not cancelled, but then actually proceeded to wrongly calculate the answer.The feedback given by the system can be helpful, but needs to be evaluated critically by the learnerwhich in itself is an increasingly valuable competency, as AI will almost invariably become ubiquitous in everyday life.
In any case, the system should not attempt to give away the solution; unfortunately, using the prompt "Act as a socratic teacher and assist a learner with this solution attempt" made the reply more personable and supportive ("You're on the right track, but there is a small mistake . . ."), but did not stop the system from providing a solution, either.

VI. LIMITATIONS
This study is limited to one particular tool, GPT-4, at one particular point in time, mid-April 2023.Due to the rapid development of AI-tools, it can thus provide nothing more than a snapshot of what is minimally possible.The initial step, conversion of handwritten solutions involving mathematical expressions to a machine-readable format could not be investigated here, due to lack of access to the required Application Programming Interface.
The author had to make choices when providing prompts to GPT.While some experimentation went into formulating the prompts, due to the probabilistic nature of the responses, some decisions were made based on anecdotal evidence, and better prompts framing the solutions could well have resulted in more reliable results.
GPT is only one of the rapidly evolving tools becoming broadly available.There are competing solutions by Google [34], NVIDIA [35], and Microsoft [36], which may perform better or worse, but could not be evaluated here.

VII. OUTLOOK
The next step to this exploratory study would be enabled by access to the API (either of GPT or of other tools on the market) and consist of fully implementing the workflow in Fig. 1; performance of the optical character recognition is also crucial to the reliability of future systems.
A future study should involve authentic student work instead of GPT grading itself, for example from exams in large-enrollment physics courses, and compare the results from AI-grading to those from a traditional grader pool.Particularly in high-stake exams, grading usually involves more than one round, so inter-rater reliability could also be established for the human graders.As a welcome side-effect, earlier steps in Fig. 1 could provide better workflow management even for human graders, as less paper would be shuffled around and grades could more easily be adjusted.

VIII. CONCLUSION
In this exploratory study, GPT has shown considerable potential for grading freeform student work in physics.While AI-assigned grades have a strong correlation to manually assigned grades, they are currently not reliable enough for summative assessments, such as high-stake exams.The system, however, is reliable enough to assist human graders by pre-sorting or clustering solutions and by providing preliminary scores.GPT still remains hampered by its limited capabilities and inconsistencies carrying out symbolic and numerical calculations, so currently several independent grading rounds are needed.The narrative feedback provided by the system seems plausible, but currently still frequently falls short of being reliable.The system can be helpful in formative assessment, but also in that scenario, learners need to critically evaluate its responses.
milliseconds, half the initial current will be flowing through the resistor.

20 3 . 1 ±
0.7 3.5 ± 0.5 2.0 ± 1.3 3.3 ± 0.6 Solution 20 generally employs the correct approach and uses the exponential decay equation for solving the problem, but there are errors in the numerical calculations, specifically in the time constant and initial current.These inaccuracies lead to an incorrect final result.While some graders appreciate the clear and straightforward steps, the majority point out the numerical errors affecting the outcome.

23 2 . 1 ±
1.5 2.6 ± 1.1 1.0 ± 1.6 2.1 ± 1.4 The majority of the graders find Solution 23 to have a correct approach and clear step-by-step argumentation.However, there are concerns about the incorrect final numerical result and potential errors in the calculations involving the exponential term and natural logarithm.Some graders also mention an incorrect use of the time constant, incorrect equation, and wrong numerical value.

FIG. 11 .
FIG. 11.Correlation between manual and AI-grading total scores.For each data point, the solution number and combined AI-grading standard deviations are indicated.

FIG. 12 .
FIG.12.Dendrograms and heat maps of similarities between solutions based on manual grading (top panel), AI-grading (middle panel), and AI-approach similarity (bottom panel).The cluster cuts are indicated by purples lines; the dashed line indicates possible cuts.

TABLE II .
AI-grading of

TABLE III .
AI-grading of Solutions 20-25.± 1.1 1.9 ± 1.0 0.7 ± 0.8 1.8 ± 1.1 The majority of the graders agree that Solution 21 has an incorrect approach, incorrect or incomplete argumentation, and incorrect numerical calculations.The relationship between i(t) and dv/dt is also incorrect, leading to a wrong calculation for t 2 .Some graders acknowledge clear explanation, symbolism, or straightforwardness in parts of the solution, but overall, the solution is deemed incorrect and weak.

TABLE IV .
Clustering based on the dendrograms and heat maps in Fig.12.