Using Virtual Reality in Electrostatics Instruction: The Impact of Training

Recent years have seen a resurgence of interest in using Virtual Reality (VR) technology to benefit instruction, especially in physics and related subjects. As VR devices improve and become more widely available, there remains a number of unanswered questions regarding the impact of VR on student learning and how best to use this technology in the classroom. On the topic of electrostatics, for example, a large, controlled, randomized study performed by Smith et al. 2017, found that VR-based instruction had an overall negligible impact on student learning compared to videos or images. However, they did find a strong trend for students who reported frequent video game play to learn better from VR than other media. One possible interpretation of this result is that extended videogame play provides a kind of"training"that enables a student to learn more comfortably in the virtual environment. In the present work we consider if a VR training activity that is unrelated to electrostatics can help prepare students to learn electrostatics from subsequent VR instruction. We find that preliminary VR training leads to a small but statistically significant improvement in student performance on our electrostatics assessment. We also find that student reported game play is still correlated with higher scores on this metric.


I. INTRODUCTION
Many topics in physics are inherently three-dimensional (3D), but are usually taught using two-dimensional media such as whiteboards and computer screens. Stereoscopic virtual reality (VR) allows students to view 3D scenes with depth perception, which should be advantageous for teaching certain content in physics and other STEM disciplines. Efforts to develop stereoscopic VR visualizations for physics began in the mid-1990s 2-4 and continue to the present day (e.g. [5][6][7][8][9][10] and references therein) as the technology improves.
The growth in VR 11 content for physics should be followed by detailed studies of the impact of these visualization methods on student learning. The studies that have been performed in physics and related STEM fields report varying degrees of success [12][13][14][15][16][17][18][19][20][21][22] , including many cases in which stereoscopic visualization techniques did not prove to be pedagogically more valuable than more conventional visualization methods. In this paper we will present data from a new study in a large introductory electromagnetism class at Ohio State University that will address these questions. A particularly affordable way to provide students with a reasonably high-quality VR experience that we emphasize in this paper is so-called Google Cardboard 23 in which a typical smartphone is placed in a cardboard or plastic headset which may only cost a few dollars. This reduced cost is important because it means each student can potentially have their own VR headset, so that VR can become a regular part of instruction. The reduced cost allowed us to perform a large study using a set of six affordably priced smartphones.
Prior studies investigating the effectiveness of VR in physics and astronomy 9,12-22 have yielded mixed results. Although students given VR interventions often report being more engaged with the material, and physical immersion in VR has been shown to increase spatial awareness in search tasks 24 , the advantage of VR over other media in achieving gains in specific learning outcomes is still unclear. Unfortunately, because of the prohibitive cost of conventional VR headsets, many of these prior studies have limited sample sizes and in some cases VR treatment was not compared to a control group.
There have been a few large studies with careful controls. Madden et al. 21 considers a VR intervention for an astronomy course on the topic of the moon phases. That study had 172 participants across three treatment groups (VR, computer, and "hands-on"/control), and found no statistically significant difference in learning gains between treatment groups.

2
A fuller description of their study appears in 22 .
Other large studies by Smith et al. 1 and Porter et al. 25 , which include several authors of this paper, did not find statistically significant differences in pre-post test gains for VR compared to other media on topics of electrostatics and magnetostatics. The studies involved, respectively, 301 and 289 participants from college-level introductory physics classes. Also of note is a study by Greenwald et al. 26 where 20 college students completed activities relating to electrostatics and answered questions. Students who completed the activities in a VR headset and interacting with the virtual environment did not outperform students who completed essentially the same activities by drawing on a tablet (i.e. a 2D medium).
If one looks outside the physics content areas just described, there do exist large studies in which statistically significant effects of stereoscopic VR compared to other media have been detected. A notable example from college-level mathematics is Porter and Snapp 27 .
A recently published meta-study by Merchant et al. 28 considered dozens of K12 VR studies and found (among other conclusions) that VR content overall tends to be effective in producing learning gains. However, the goal of Merchant et al. 28 was not to weigh the usefulness of stereoscopic VR versus more traditional media, and the meta-study considered non-stereoscopic virtual worlds accessed through conventional desktop and laptop computers to be VR. So while the paper is very interesting and thorough, its relevance is in many ways oblique to the work that we will describe here.
In Smith et al. 1 , although VR did not prove to be more effective for students in general, it was found that students who reported frequent video game play (a.k.a. "gamers") and were given the VR treatment had much higher gains than any other group (non-gamers, or gamers who received electrostatics instruction from video renderings or images). Porter et al. found a similar trend for "gamer" students to significantly outperform non-gamer peers on magnetostatics assessments after viewing magnetostatics content, although, interestingly, the VR treatment did not help the gamer students more than other media as Smith et al. Due to a lack of independently-validated assessments for electrostatics with a high fraction of 3D questions, we developed a suite of questions as a preliminary survey of 3D electrostatics (see the Methods section). This is only briefly summarized there because of page constraints.
The reliability of this survey is discussed below, along with student performance.

II. METHODS
The subjects of this work were students in the second semester of an introductory calculusbased physics course at a large Midwestern university, offered in autumn. This course was being offered "off-sequence" meaning that students who begin physics in their first semester would have taken this course in spring. Students were offered the equivalent of one homework assignment's course credit for coming to our lab and participating in either the research study, or an alternative assignment of roughly equivalent length. Of 281 initial respondents, 279 agreed to participate in research.
As students entered the testing area, they were randomly assigned to one of two treatment types: VR with preliminary training, and VR with no initial training. The assessments were identical for all students, regardless of treatment type, except for a few questions posed during the preliminary training, which were unrelated to electrostatics. The students' average overall performance in physics was fairly constant between treatment types, as determined by post-hoc analysis of students' final scores as a percentage of points in their physics course (Training: (84.5 ± 0.9)%, No Training: (84.2 ± 0.9)%, p > 0.8). There was almost no variation in the percentage of students reporting their sex as female in the two treatment types (Training: 20%, No Training: 20%). Although gender identity would be a 4 better descriptor of participants, gender identity is not available.

A. Treatments
VR visualizations were created as Android smartphone applications. The apps were written using Unity, a cross-platform game engine developed by Unity Technologies 30 , and the Google VR SDK for Unity. Smartphones were placed in plastic goggles which have lenses to focus the near point of the eye, and a divider to split the field of view. The smartphone displays an app in a split-screen mode so that 3D phenomena are shown on the right half of the phone to the right eye from an angle slightly to the right, and the equivalent is shown to the left eye from an angle slightly to the left. This creates a stereoscopic 3D virtual reality experience giving the impression of depth perception.
Preliminary training: Students in the preliminary training group viewed scenes that were unrelated to electrostatics. In the first scene, students were shown a 3D model of a house, and were asked to rotate the house, view it from all angles, and count the number of windows.
In the second scene, students were shown a 3D model of a single-propeller airplane, and were asked three questions related to angular momentum such as the initial direction of L, and the change in L if certain maneuvers are performed. Screenshots from these training scenes are shown in Fig. 1.
Only the preliminary training group was given these initial scenes. Students took an average of 4 minutes on all training scenes combined. All students took a pretest on a 2D smartphones. The app splits the phone's screen into two halves, one for each eye. Each phone is then placed in a cardboard or plastic viewer. The students can then view the electric systems in stereoscopic 3D. The app utilizes the smartphone sensors so that when the students turn their heads, the system being displayed on the screen rotates, allowing students to see it from any orientation. Students were shown 7 instructional scenes and were told to look around and study the magnetic field vectors from many angles before moving on. Students were also asked a series of 3 questions within the VR simulation to ensure that students were engaging with the content. Students controlled the rate at which the visualizations progressed.

B. Assessment
Discussions with experienced instructors were used to determine which aspects of electric fields are commonly prioritized in their learning goals for this course. The study team then designed a set of problems on this content that are highly three-dimensional in nature, and are therefore most likely to be aided by stereoscopic 3D treatments. These problems fall into two broad categories: (1) determining the direction of E at locations that are not simply co-planar with a distribution of charge, and (2) understanding features of the vector field as a whole, such as divergence.
The study team wrote 13 pretest questions and 11 posttest questions to address the above topics. Because these questions have not been independently validated by other groups, we provide in the present work additional statistics on the reliability of this assessment. Treating the question sets as a preliminary scale for three-dimensional understanding of electric fields, a reliability analysis in SPSS reveals a Cronbach's alpha of 0.91 (or 0.83 for the pretest only, 0.82 for the posttest only). A factor analysis in SPSS revealed a single factor with an eigenvalue greater than 1 (4.8), and this factor explained 43% of the variance. From these data we conclude that although the assessment still needs to be independently validated and could be improved in many ways, it does appear to be internally consistent and statistically well-behaved. 7 We note that the questions used on this assessment are not identical to questions used in Smith et al., although there is some overlap. Some questions from Smith et al. were altered to allow for partial credit if students get some Cartesian components of the electric field direction correct, but not all. Questions were also added that included arrangements of three or four charged particles, such that the particles and point at which the electric field direction is to be determined did not lie in any 2D plane. These differences, coupled with the fact that students from Smith et al. were from an "on-sequence" course, mean that these studies cannot be directly compared, quantitatively.

C. Gaming
One additional difference between the present work and Smith et al. is that in Smith et al. students were asked about gaming frequency, but were not asked about the type of game they primarily play. In this work, students were asked how frequently they currently play videogames, and were then asked whether the games are primarily 2D, 3D, or both. Common examples of each were given (such as Candy Crush for 2D, and Minecraft for 3D). In this work we are interested in students' experience with 3D gaming, specifically. A composite score S was made from the students frequency F of game play and three-dimensionality D of the games: S ≡ F * D. Students with scores S higher than the mean are referred to here as "3D gamers". Sometimes, for brevity, we will refer to them simply as "gamers"; the present work contains no comparison between 3D gamers and other types of gamers.

III. RESULTS AND DISCUSSION
We find that the pre-trained group did have higher gains than the untrained group (see Because this work was initially motivated by the correlation between gains from VR treatments and gaming experience, it is worth breaking down these scores by gaming experience. Fig. 4 shows the gains from the two treatment groups broken down by prior gaming experience. That trained gamers show positive gains and that trained non-gamers show yet higher gains fits the hypothesis that training can compensate for a lack of familiarity with virtual environments and visuospatial rotations. However, we were surprised to see that gamers who did not receive preliminary training performed worse than any other group, having negative gains. Untrained non-gamers, for example, had small but positive gains. In light  FIG. 4. Average post-pre gains for the group given preliminary training, and the group that did not, separated by those reporting high gaming experience and low gaming experience of these inconsistent results, it is important to note that there are no statistically significant differences between the performance of gamers and non-gamers in their physics course overall (gamers: 84.0% ± 0.8%, non-gamers: 85.2% ± 1.0%, p = 0.35). The inconsistent interaction effect between training and gaming casts some doubt over the the simple hypothesis that prior video game play provides important advantages to students for which training can at least partially compensate. It is unclear whether the hypothesis may yet be true, since this inconsistent interaction effect of gaming and training is not statistically significant (p = 0.41, repeated measures analysis of variance, with treatment and gaming score as between-subjects factors).
To obtain some insight into this paradoxical result, we considered that questions were asked in two formats: some questions were posed in VR while others were posed on a conventional computer monitor (i.e. "in 2D"). These questions can be categorized into four groups: 1) pretest questions asked entirely on a computer monitor, 2) questions posed in VR during electrostatics instruction, 3) posttest questions asked on a computer monitor, and 4) posttest questions posed in VR. Fig. 5 shows student scores on each of these question sets, arranged in chronological order, and split into trained and untrained groups. Pretest scores for the two groups (which were asked in 2D) are consistent within standard error. Likewise, the non-VR posttest scores are consistent between the two groups and the overall result is that the non-VR posttest scores are slightly lower. It is unlikely that the posttest questions are significantly more difficult than the pretest questions, because the two question sets are identical up to occasional swapping of positive and negative charges, or rotation from one high-symmetry point to an analogous high-symmetry point. This being the case, the similarity between the "pre" and non-VR "post" test does seem to imply that the VR intervention had a negligible impact on student learning, regardless of whether students were in the trained or untrained group, and contrary to our expectations.
Perhaps the most interesting feature of Fig. 5 is the large and statistically significant 11 (d = 0.22, p = 0.02) difference between the two treatment groups' scores on the questions posed in VR at the midpoint, during electrostatics instruction in VR. The group that received preliminary training in VR performed approximately as well on the questions posed in VR as they did in the pretest (which was in 2D), whereas those who received no initial acclimation to VR had scores about 16% lower at this mid-point. So in this sense there was a net benefit for students who received the initial acclimation with VR, but the benefit was limited to questions posed in VR. Given that this inquiry was prompted by an apparent connection of the subject matter with student gaming history, we further break down these four question sets by student gaming history. This is shown in Fig. 6.
The dashed lines show the scores by the untrained group, and the solid lines show those of the trained group. Here we see that the drop for "Mid VR" questions is seen in both untrained gamers and untrained non-gamers. We also see that, overall, gamers score higher on the pre-test than non-gamers. Interestingly, by the second set of VR questions, which is the last set of questions that students complete, there is no statistically significant difference between these groups either by gaming or by training.
It bears mentioning that the non-gamer group has proportionally more women than the gamer group. As is shown in Fig. 7, this means that equalization between self-reported gamers and non-gamers by the last question set also corresponds to equalization between males and females.

A. Student feedback
Upon completion of all questions related to electrostatics, but prior to being asked demographics questions, students were asked to rate the VR intervention in three ways. They were asked 1) "How helpful was this intervention?", 2) "How enjoyable was this intervention?", and 3) "How likely are you to recommend this intervention to a friend?". Students were asked to respond using a 5-point scale ranging from -2 ("Highly unhelpful", "Highly unenjoyable", "Highly unlikely", respectively) to +2 ("Highly enjoyable", etc.) with 0 being neutral. In all cases, average scores were positive, very close to +1 ("Helpful", "Enjoyable", "Likely"). Figure 8 shows that both groups (trained and untrained) found the VR instruction equally helpful and equally enjoyable. Although there was a slight difference in how likely the two groups were to recommend the VR to a friend, the difference is not significant after a post-hoc (Bonferroni) correction.
We find analogous results when splitting student responses according to sex and according to gaming history (not shown). There are no differences in means that are significant after a Students significantly preferred VR over other treatment types.

IV. DISCUSSION AND CONCLUSION
Smith et al. 1 found that students who reported frequent video game play seem better able to learn from VR-based instruction on electrostatics than "non-gamers". As discussed in that study, these "gamer" students who received VR instruction significantly outperformed students who received equivalent instruction from videos or still images. In designing the present study our hypothesis was that preliminary training in VR on a topic unrelated to electrostatics would help even non-gamer students perform as well on electrostatics problems as gamer students.
If comfort with visuospatial rotations in an electronic context (either through training or gaming history) were all that were required to learn effectively from VR, then one would expect students who reported low videogame play and who did not receive the training to have low or negative learning gains. Instead, we found (as summarized in Fig. 4) that students who did report frequent video game play but who did not receive the training improved the least, with an overall worse score on the post-test than on the pre-test. Other results on Fig. 4 were unsurprising and generally the training helped students to perform overall better on post-tests. Figure 5 shows that the primary effect of VR training was to increase the trained group's scores on in-VR questions during instruction. The training, however, did not improve scores on the 2D, computer-based post-test questions. Also, trained and untrained groups achieve nearly the same score on the post-test in-VR questions. One possible explanation of this is that when electrostatic questions are posed in VR to students who have never used VR, the experience is overwhelming and they do poorly. But they do better later on when similar questions are asked in VR the second time, such that the midpoint VR experience serves like a preliminary training for the initially untrained students. In other words, all students perform better in VR after one or more exposures to questions in VR. The VR instruction appears to have had no effect on the computer-based post test. This indicates either that the in-VR training is ineffective, or that learning in the VR context is not transferring to the 2D context, or both.
This breakdown of the data also shows that gaming experience correlates with average scores on these electrostatics assessments, including a pre-test that was given before receiving any electrostatics instruction. Fig. 6 shows that gamers scored around 13% higher on all question types, except the final post in-VR questions, making this a much larger effect than any due to the VR treatment. This result is an important context for the perplexing data presented in Fig. 4 which showed untrained gamer students performing the worst in terms of gains, contrary to our expectations.
The original intent of this study was to compare overall gains between the two treatment groups, and not to compare groups' performances on individual questions or questions in one medium compared to another. It is therefore entirely possible that the observed uptick in performance on the 'post in-VR' questions is simply due to variation in question difficulty.
It seems unlikely, though, that males could score 75% on the pretest, and 77% on the post in-VR questions if their difficulty levels were very different. The apparent equalization of male and female scores on the post in-VR questions thus strongly warrants additional study.
Although we find evidence that VR training is beneficial for student acclimation to VRbased instruction, overall, we conclude that the VR-based instruction in this study has essentially no effect on student understanding overall. This is especially apparent if the final goal is for students to answer inherently 3D questions on 2D media like paper exams and  31 (engineering) that VR-based instruction is not more effective than other media at teaching inherently 3D topics. Since interest in VR-based instruction in physics is unlikely to subside, one takeaway from our study (borne out particularly in Fig. 6) is that VR training does seem to positively affect all students' ability to learn in a VR environment.