Gender differences in the Force Concept Inventory for different educational levels in the United Kingdom

The Force Concept Inventory (FCI) is widely used to investigate the effect of education level on conceptual understanding of Newtonian mechanics but has only recently been scrutinized for gender effects and retention. This study examines both the gender gap in first year physics undergraduates compared to the gap for nonphysicists and the FCI retention after three months. All participants were either studying or working at the University of Sheffield in the UK and had completed a similar compulsory level of secondary education. As expected the results show that a greater level of education in physics is associated with a larger average FCI score. However, further analysis shows that there exists a gender gap at all levels of education. The size of the effect of gender is quantified using Cohen ’ s d and ranges from 0.84 to 1.17 which indicates a large effect due to gender for all levels of education. Despite the FCI having been used as a tool to measure learning gains immediately following instruction in Newtonian mechanics there has been little work to investigate whether this increase in FCI score remains after some time has elapsed. Here the increase in FCI scores is found to remain increased after a three month absence of mechanics-related teaching, and that this retention of FCI scores is independent of gender. Despite this, the gender gap still remains large and statistically significant after the three month delay. DOI: 10.1103/PhysRevPhysEducRes.15.020135


I. INTRODUCTION
Educators and policy makers have long been concerned by the underrepresentation of women in science, technology, engineering, and mathematics (STEM) disciplines in both higher education and professional careers [1,2]. In the U.S. only 25% of STEM bachelor's degrees and 26.3% of Ph.D.s were awarded to women in 2011 [3]. A similar trend is also found in the UK [4]. This representation remains into employment with only 24% of STEM workers being female (based on both 2000 and 2009 census data in the U.S. [5] and a similar proportion in the UK [6]). The potential underlying causes for this gender gap range from biological factors such as pre-and postnatal exposure to hormones [7] to lower self esteem during introductory science and mathematics courses being factors behind women leaving science and engineering majors [8].F o r more in-depth discussion of the research into the underlying causes of the gender gap the interested reader is directed to the work of Halpern and colleagues in 2007 [9] and a complementary and updated review by Ceci and colleagues in 2014 [3]. However, despite there being no consensus on the factors underpinning the gender gap, a number of strategies have been implemented to narrow it. These include the use of everyday experiences relevant to both males and females [10,11], alternating between group discussion and structured teaching [12], the use of active pedagogies [10,13,14], and providing a more diverse range of assessment and feedback methods [15,16].
The role of assessment is critical for teaching physics at college and university through problem solving [17][18][19][20][21] but the need for practice and the development of effective strategies [19] mean that many students completing an introductory course in physics only develop a weak understanding of the underlying concepts [22][23][24][25][26]. There are a number of systems for evaluating conceptual understanding of mechanics in physics [25,27,28] but the most well known and widely used is the Force Concept Inventory (FCI) developed by Hestenes, Wells, and Swackhamer [24] based on their earlier mechanics baseline test [29].
Since their initial work a number of other researchers have used the FCI to measure potential learning gains following instruction in an introductory physics course. The first large scale study (N ¼ 6542) was conducted by Hake [30], who defined the normalized learning gain as where hS i i and hS f i are the initial and final average percentage scores, respectively. This study showed that courses taught using traditional methods (N T ¼ 14) (such as didactic lectures, recipe-driven laboratories, and algorithmic-problem exams) led to low positive learning gains (⟪g⟫ T ¼ 0.23 AE 0.04) whereas 85% of "interactive engagement" courses (N IE ¼ 41) showed medium learning gains (⟪g⟫ IE ¼ 0.48 AE 0.14). Similar large-scale studies have demonstrated comparable normalized learning gains as measured using the FCI [31], and there is some evidence that these learning gains persist for a number of years after education [32]. The FCI has not been without its criticisms. Huffman and Heller [33,34] conducted a factor analysis of the data presented by Hestenes and colleagues in their original work and concluded that it is unclear whether the FCI tests an understanding of key concepts or whether students achieve high scores through "…small bits and pieces of knowl-edge…" or familiarity with the context of the question. A more recent factor analysis [35] suggests that a five-factor model accounts for 40% of the variation in their sample (N ¼ 2109) but that some questions in the FCI conflate multiple factors, most notably Newton's first and third laws, and that the kinematics concepts originally listed by Hestenes and colleagues do not appear to form a distinct factor following factor analysis. Both studies argue that even though the FCI may be a useful diagnostic tool they advise that it should not be used as a summative assessment given that there is still some uncertainty as to what underlying constructs the FCI actually measures.
With these criticisms in mind the FCI has been subsequently used in a number of different ways. Savinainen and Scott [36] demonstrate how this diagnostic tool can be used to shape and refocus teaching activities, whereas a number of researchers in the U.S. [37][38][39] and UK [40] have used the FCI to explore the participation and performance gap between males and females studying physics in higher education. This gender gap has also been observed across the world and at both secondary and postsecondary levels, [10,14,30,41] with the average score of females being consistently lower than males.
In this study the gender gap in first year physics undergraduates is compared to the gap for nonphysicists, all studying or working at the University of Sheffield in the UK. The retention of increased FCI scores is also examined following the U.S.-based work of Francis, Adams, and Noonan [32], however, here the cohort is based in the UK and a retention gender gap is investigated.

II. METHOD
A. Sample demographic A significant number of studies examining the FCI are based in the United States whereas the participants in this study are from a different educational background and environment. In the United Kingdom all children are required to attend school up to the age of 16 at which point they sit for a set of national General Certificate of Secondary Education (GCSE) qualification examinations. Participants who sat their mandatory qualifications at 16 before 1988 would have sat a different but equivalent qualification known as ordinary level or "O level." For the participants involved at age 16 they could either leave education or continue by taking further qualifications known as A levels (advanced levels), marked on a letterbased scale A*-E. These are typically seen as the entrance qualification for university study, with three qualifications being a common entry requirement for most courses. Students who wish to study physics at the University of Sheffield need a minimum of one A and two B grades which must include physics and mathematics although the majority exceeded this with an average grade profile of AAB. There are also distinct differences between UK and U.S. university courses. Students studying a course in the UK have already selected their degree course at admission and will spend their time only studying their chosen subject (or subjects in the case of dual degrees).
In this study two participant groups were recruited. Group A (N ¼ 125) was recruited via a volunteer email distributed to all staff and students with an invitation to take part in an online questionnaire. The invitation email stated that anyone who is currently studying for or already holds a degree in physics should not take part. The data were checked for any respondents stating a degree or higher in physics and any such data were excluded from analysis. Group B was recruited from a class of 174 undergraduate students in their first year of a physics degree, all meeting the minimum A-level qualifications described previously.

B. Data collection
A revised version of the original FCI, known as the FCI v95, was used in this study. Furthermore the recent work by Traxler and colleagues [42] demonstrated that certain questions in the FCI v95 displayed a "…gender-unfairness…" that give rise to an inflated false gender gap, although they do note that a gender gap persists even after they removed the gender-unfair questions from analysis. In this study the "gender-unfair" questions highlighted by Traxler and colleagues were excluded from analysis but were present when the FCI was administered. Items 6,9,12,14,15,22,23, and 27 were highlighted by Traxler and colleagues based on their work combined with previous studies, and they also identified items 21, 24, and 29 based on their work alone.

Group A
For group A the reduced FCI v95 test was reconstructed into a Google form in the following structure. First, participants were provided with information about the study and to give required consent. Next the FCI questions were presented with only one question on screen at any one time. Finally, participants were asked to complete a set of optional demographic questions including gender identity (select from "female,"" male," or "prefer to self define"), age, highest qualification in physics, highest qualification other than physics, and department or professional services team (free text).
Group A is subsequently split into two groups: the "GCSE or O level" group whose highest qualification in physics is either GCSE or O level, and the "Alevel" group who have an A level in physics. The GCSE or O-level group was made up of 75 participants (n female ¼ 47;n male ¼ 28)a n dA -l e v e l group had 54 participants (n female ¼ 16;n male ¼ 38). No participant chose to self-define their gender identity.

Group B
All students enrolled in a physics degree (including dual degrees) are required to take the compulsory module "Mechanics, Waves and Optics" in their first semester at the University of Sheffield. This established course is split into four subunits (mechanics, waves, optics, and special relativity), each of which are taught by a different lecturer through didactic lectures (12 lectures per subunit) with additional weekly 1 h small group tutorials. The reduced FCI v95 was delivered via the Blackboard Virtual Learning Environment at three time points: (1) Precourse-during the first week of first semester, (2) Postcourse-the week immediately after completion of the mechanics component of the module (6 weeks after precourse test), (3) Delay-three months after postcourse test. Ninety-three students out of the class of 174 completed the FCI at all three time points and thus make up group B (n female ¼ 26;n male ¼ 67). They used a unique identification number to allow longitudinal analysis to be undertaken but this also prevented the use of engagement measures or course attainment as a covariate within analysis. Students were not provided with their results after any of the tests. Scores did not contribute to any of their university course grades and students were informed that participation was voluntary. As this group comprises of participants with a similar educational background and current educational situation they were only asked to voluntarily provide their gender identity after the delay test. An independent samples Mann-Whitney test showed no statistically significant difference between A-level grades of male and female undergraduate students in group B. The average grades achieved for both male and female groups was one grade above the degree entrance requirement, typically an AAB (compared to the entry requirement of ABB).

III. RESULTS
A. Effect of gender on scores for each level of qualification A statistically significant difference was found between males and females for GCSE or O level (group A), A level (group A), and precourse (group B), with males outperforming in all cases. For group B this gender difference persisted immediately after completion of a mechanics course as well as 3 months after completion of course. The results of all analyses are shown in Table I. For both male and female groups there is an increase in average FCI score with increasing level of qualification in physics. This is in agreement with the work of Coletta and Phillips who found a correlation between SAT and FCI scores at four universities in the U.S. [43].
Cohen's d [44] was calculated to indicate the size of the effect of gender for all of these comparisons. In all cases the effect size is considered to be large when using Cohen's original suggested interpretation. However, more recent work by Rodriguez and colleagues [45] cautions that effect sizes are not absolute indicators in physics education research. They note that care must be taken when comparing effect sizes observed in different studies published across the research field due to the uniqueness of the specific groups and conditions being compared.

B. Effect of gender on FCI and retention
Wilcoxon signed rank tests between precourse and postcourse scores as well as between precourse and delay scores showed a statistically significant difference for both These results are summarized in Table II. Comparison between postcourse and delay scores showed no significant difference for either male or female cohort groups. This result is in line with previous studies [32], however here the retention is also shown to be independent of gender.
Although the normalized learning gains [30] defined by Eq. (1) are included in Table II to allow for comparison with other studies these values should be interpreted with care. Participants in group B demonstrated a high average on the FCI when compared to a majority of studies conducted in the U.S., which is likely due to the specialist nature of UK degrees, namely, that all 93 participants were registered for a full physics degree course. These high averages mean the normalized learning gain is more susceptible to skew and therefore Cohen's d is a better representation of the effect observed.
The effect size between precourse and both postcourse and delay are large for female participants and medium large for male participants. The normalized learning gains suggest that there is little difference between the female and male groups, in contradiction to the statistical analyses above, however as Rodriguez et al. [45] demonstrate, the effect size is the more preferred measure compared to normalized learning gain.

IV. DISCUSSION
This study adds to the wide pool of existing research and shows that a gender gap and its subsequent narrowing following instruction also exists within UK-based physics undergraduates. The initial FCI score is considerably higher in this work compared to research undertaken in the U.S., however this is likely due to the differences between the U.S. and UK undergraduate courses, namely, that in the UK students are enrolled on a single degree programme from the start rather than selecting minors and additional credit courses.
This gender gap is also seen at different levels of education in physics from compulsory secondary school education to first year undergraduate students, and the magnitude of the effects are large in all cases. More interestingly, the effect size before and immediately after instruction in mechanics are very similar (0.94 and 0.92, respectively), however the effect decreases slightly after a 3 month delay. This decreased effect size of 0.79 shown in Table I is still statistically significant and suggests the gender gap may decrease with longer time delays however at present no other studies have investigated the gender gap over time and, consequently, the reasons behind this slight decrease are a matter of interest. It may be that different levels of male and female attrition or study participation may introduce a sampling bias or that after three months female students have developed more of a sense of belonging which is known to affect attainment and engagement [46]. Furthermore, there have been many attempts to reduce the gender gap through interactive engagement techniques [37] and the results found here suggest that educators should consider the time between instruction and testing particularly when scheduling formative assessments.
Finally, this work also shows that FCI scores persist after instruction in agreement with the U.S. based study by Francis, Adams, and Noonan [32]. This retention is also shown to be independent of gender with females showing a large effect size gain whereas males only a medium-large gain. This effect is obscured if only the normalized learning gain is considered [30] due to the large initial FCI scores. Despite the larger gains for female participants the average score remains 9% lower than male participants following instruction. It would be interesting to examine whether the use of interactive engagement methods that have been shown to reduce the gender gap remain effective after instruction is complete.