An equity investigation of attitudinal shifts in introductory physics

We report on seven years of attitudinal data using the Colorado Learning Attitudes about Science Survey from University Modeling Instruction (UMI) sections of introductory physics at Florida International University. This work expands upon previous studies that reported consistently positive attitude shifts in UMI courses; here, we disaggregate the data by gender and ethnicity to look for any disparities in the pattern of favorable shifts. We find that women and students from statistically underrepresented ethnic groups are equally supported on this attitudinal measure, and that this result holds even when interaction effects of gender and ethnicity are included. We conclude with suggestions for future work in UMI courses and for attitudinal equity investigations generally.


I. INTRODUCTION
The University Modeling Instruction curriculum (UMI; [1]) developed and studied at Florida International University (FIU) has produced an uncommon pattern of consistently positive shifts in student attitudes toward physics [2]. The case for studying student attitudes and epistemologies has been made at greater length elsewhere [2][3][4]; here, we will summarize those arguments, but largely take as a given that improving students' attitudes toward physics is one relevant dimension of success for a curriculum. However, education researchers must be cautious of overgeneralizing results, and one such overreach is to claim that a benefit is received by all students when in fact it only accrues to those from majority groups. FIU, a Hispanic-serving institution with a large fraction of women in the calculus-based modeling sections, provides an important opportunity to investigate this aspect of the UMI curriculum with a diverse student body. Section II discusses the context of gap-based analyses in education research, outlining some potential pitfalls of this approach and why we have chosen it here, and also summarizes some of the most relevant results on attitude surveys. Section III outlines the context of data collection and the research questions considered. Section IV summarizes our results, and Section V concludes with suggestions for future equity investigations of attitudinal or conceptual measures, cautioning to avoid forms of "gap gazing" that can further marginalize underrepresented groups.

A. Gaps analyses
Examination of performance differences, or looking for "gaps" between groups, is not without controversy in education research. As outlined by Gutiérrez [5] in mathematics and Danielsson [6] in physics, gaps analyses run the risk of essentializing student identities by overgeneralizing (e.g., "all women"). Gutiérrez argues that gap analyses often implicitly reinforce a deficit model in which students' differences are presumed to be the result of inadequacies in preparation, skill, or ability. Further, she argues, this frames students from different backgrounds in opposition with one another. Lubienski [7], on the other hand, contends that investigations of gaps are critically important to inform education policy and that it would be "irresponsible" to stop making gaps analyses. Following on Lubienski, we feel that it is not just valuable but essential for teachers and curriculum developers to question whether the benefits of instruction are distributed equitably among statistically underrepresented and majority student groups. Some gap-based analyses, when thoughtfully conducted, have deepened our understanding of the mechanisms behind systemic performance differences on traditional academic measures. One key example is stereotype threat, originally uncovered when testing different framings of a difficult verbal test given to white and African American college students [8]. This landmark study and many following (for one review, see Ref. [9]) reveal a previously invisible barrier for women and students from statistically underrepresented racial and ethnic groups. Aware of negative stereotypes about their groups and invested in disproving them, these students face extra cognitive load from their awareness, and often show performance drops in the very subjects where they care the most [10].
Stereotype threat research has led to a richer understanding of how to frame classroom tasks in a manner that better supports all students. This work, including some in physics education research [11], would not have been possible without a willingness to investigate the causes of systematically observed performance differences between groups. Indeed, while Gutiérrez outlines pitfalls of gaps analyses, she also gives suggestions for avoiding them [5]. These suggestions include a greater focus on intervention work and on teaching and learning environments that support students from diverse racial, ethnic, and socioeconomic backgrounds. In the spirit of the second category, we focus our attention on data collected from University Modeling Instruction classes at FIU.
Rodriguez et al. [12] discuss three predominant models of equity in the context of physics education research: Equity of Fairness, Equity of Parity, and Equity of Individuality. Under the Equity of Fairness model, students from all populations should experience similar gains or losses. Equity of Fairness models would preserve preexisting gaps, such as the widely documented gender difference in Force Concept Inventory (FCI) pretest scores [13]. In the Equity of Parity model, students from one population might enter with lower scores on some measure, but all should leave with the same score distribution. Interventions striving for gap closing work from an Equity of Parity model. The study by Lorenzo, Crouch, and Mazur [14], showing a reduction or elimination in the FCI gender gap over a semester, is one example. Finally, Equity of Individuality investigations explicitly avoid group comparisons and instead focus on understanding individual excellence. An example of work in this category is a study by Goertzen, Brewe and Kramer [15] that uses case studies to examine several students' increasing levels of participation in the physics learning community at FIU, a large Hispanicserving institution. Gap-based analyses are unable to speak to Equity of Individuality models, but may still provide important insight to Equity of Fairness or Equity of Parity questions. Research such as this paper, which explores differences in attitudinal shifts between groups, is relevant to Equity of Parity and Equity of Fairness models. Previous work has outlined the epistemological goals of the UMI curriculum, which frames modeling as the key activity of scientists [1,16]. UMI classes have shown favorable student outcomes in conceptual understanding [17], in self-efficacy [18,19], in student social network measures [20], and in student attitudes towards physics [2] and engaging in physics [15]. We expand on the latter work here by examining whether these attitudinal gains are shared equally by women and by students from black, Hispanic, Native American, and Pacific Islander ethnicities. As of this paper's writing, all four ethnic groups are statistically underrepresented in the sciences and in physics, relative to the demographics of the United States population. Collected American Physical Society statistics highlight the situation, showing approximately 20% of physics degrees go to women, and less than 10% to African American, Hispanic, or Native American students [21].
In the text, we will adopt this language of "statistically underrepresented," to avoid possible deprecating connotations of "underrepresented minorities." The term also describes more accurately the prevalence of students of color in the broader United States, while reflecting that FIU is an uncommon example of a university where statistically underrepresented students are a majority.

B. Student attitudes
A variety of studies now document student attitudes in introductory university physics [3,22,23] and the effects of students' attitudes and epistemologies on their conceptual gains [22,24], use of content knowledge [25], and choice of courses and majors [24,26]. However, these results are not always reported through a lens of demographic factors. Some published CLASS results have shown more favorable pretest attitudes and shifts for men compared to women [23,27], while information about the effects of race or ethnicity is generally not available. This situation mirrors the literature on concept inventories, where gender gap results are often published [13], but ethnic representation is only rarely examined [17].
Despite a dearth of research on differential attitudes toward physics or science generally, the research on stereotype threat introduced above cautions us that attitudinal differences are very salient for students from statistically underrepresented groups. A serious long-term consequence of stereotype threat is the filtering effect it applies to participation: students from negatively stereotyped groups, over time, often disidentify with the threatened area [28]. A negative shift in attitudes toward a subject can be an important self-preservation tactic for students who are threatened by stereotype. By devaluing the domain, they minimize the risks that the stereotypepredicted poor performance would otherwise pose to their self-image. While the CLASS does not measure physics identity directly, gender patterns in published results are troubling [23,27]. Additionally, a study asking students to report both their own and a scientist's expected answers on the CLASS found that women were equally or better able to select the "scientist" response, but the gap with their own answers was greater [29]. Because prolonged stereotype threat affects domain identification, differentially lower initial attitudes or negative attitudinal shifts may be an important warning to instructors of disengagement in students from threatened groups.
Research from the University of Colorado has shown that initial (pre-university instruction) student attitudes are strongly correlated with pursuing a physics major [26]. It is possible that strong positive shifts may also show some similar effect on recruiting students to the major. This question remains open, in part, because demonstrating consistently positive shifts has been a substantial task. However, at a more fine-grained scale, a positive shift in attitudes toward physics learning has been linked with more central membership in the physics community in FIU's rapidly growing physics major population [15].
We have ample motivation to examine patterns of positive attitudinal shifts, as potential signals of growing student investment and participation in physics. However, to accurately report promising findings, we must also ask whether any such benefits are equally received by all groups of interest. In this paper, we investigate precourse to postcourse attitude scores and shifts for students in calculus-based Physics I (mechanics) courses. From previous work, we know that the University Modeling Instruction courses are equitable by the Equity of Fairness model for Force Concept Inventory gains by ethnicity, but not by gender [17]. In other words, student gains were independent of their ethnic representation, but a gender gap widened over the semester (so not all student groups experienced equal gains). Here we extend the equity question to attitudinal shifts. This investigation contributes to the knowledge base on impacts of student attitudes by first exploring differential attitudes across statistically underrepresented student groups, and then by asking how instruction shifts student attitudes among these groups.

III. METHODS
FIU is a large, minority-serving institution (54 000 students, 61% Hispanic, 13% black, in Spring 2014) with a primarily commuter student body. Over the past ten years, the Physics Education Research Group has guided a series of structural reforms in the introductory physics courses, including the addition of University Modeling Instruction sections of the calculus-based sequence. The data presented in this paper are drawn from introductory physics I courses and were collected from the Fall 2007 to Fall 2013 semesters. Table I shows the demographics of the student sample. The gender ratio is much closer to parity in UMI sections than in traditional lecture physics courses at FIU, while the distribution of students' ethnic representation is very similar between the two course formats. Because of the popularity of the Modeling Instruction sections, students are admitted by lottery.
The Colorado Learning Attitudes about Science Survey [23] is a 42-item Likert-scale instrument, where students select their level of agreement or disagreement with statements about physics. Answers are compared to an expert response key to give a "percentage favorable" score. The CLASS was administered on paper at the beginning and end of each term and filtered for matched student responses, which are necessary to calculate shifts. We look for pretest, post-test, and shift differences between students who are statistically well-or overrepresented in physics (male, Asian, and white students) and those who belong to statistically underrepresented groups (female, black, Hispanic, Native American, and Pacific Islander students). Several ethnicities are represented by only a few students in our data set. To alleviate the statistical difficulties of small sample sizes, and to look for broad differences by representation rather than fine-grained comparisons between groups, the ethnicity component of the analysis will only distinguish between statistically well-or overrepresented (SR) and statistically underrepresented (SUR) categories.
We seek to answer two research questions: (1) To what extent does gender or ethnic representation influence students' percentage of expertlike CLASS responses in University Modeling Instruction? (2) To what extent is there an interaction between gender and ethnic representation? To address the first question, we disaggregate student pretests, post-tests, and shifts in percentage favorable responses on the CLASS. In addition to checking for statistically significant differences in these values between groups, we follow Rodriguez et al. [12] in looking for significant effect sizes. We measure effect size using Cohen's d [30]: Here, μ 1 and μ 2 are the means of two groups to be compared (e.g., mean precourse scores for men and women, μ M;pre and μ F;pre ), and σ pooled is the pooled standard deviation of the two groups. The effect size provides an indicator of "practical significance," and thus serves as a necessary accompaniment to statistical significance when reporting claims about gaps between groups [12]. The second question occurs because the intersection of gender and racial or ethnic identity is known to pose additional challenges for women of color in the sciences [31]. To address this point, we use a linear regression model including an interaction term for gender and ethnicity, and investigate whether it explains a significant amount of the variance in postcourse attitudes.
IV. RESULTS Figure 1 shows the significant and positive differences between pre-and post-course responses in the modeling classes. Disaggregating by gender and by ethnic representation, we see that all subgroups show significant positive shifts. On close examination of the histograms for pre-and post-course percentage favorable scores (not pictured), some ceiling effect may exist in the post scores. However, it is not notably more pronounced for any of the subgroups. Figure 2 elaborates on the disaggregated results by showing percentage favorable shifts for all students, by gender, and by ethnic representation. Average shifts do not vary by ethnic representation, but women do have a significantly higher average shift than men (see Table II in the Appendix for values and standard errors). Figure 3 shows the effect sizes, Cohen's d, of group differences on pretest and post-test. We see that for both gender and ethnicity, on precourse and postcourse administrations of the CLASS, the effect sizes of the differences are small (jdj ≲ 0.2) and the error bars span zero. This overlap indicates that there is no meaningful difference between the pre-and postcourse means for men compared to women, or statistically represented compared to statistically underrepresented ethnicities. As advocated by Rodriguez et al. [12], we find that examination of effect sizes adds nuance beyond that provided by null hypothesis significance testing. Figure 1 shows that women's precourse averages were somewhat lower and their postcourse averages somewhat higher compared to men, producing a statistically significant higher average shift (Fig. 2). The nonsignificant effect sizes in Fig. 3   practically significant difference in the distributions by gender. One final caveat comes from the sample sizes, which are comparable for gender but uneven for ethnic representation (with relatively few statistically represented students at FIU). As per Cohen's notes on statistical power [30], there is some risk that a small effect for gender or a medium-size effect for ethnic representation may have been missed due to sample size constraints. Accordingly, the effect size findings contextualize Figs. 1 and 2, but will be more robust as future data is accumulated. Section V revisits effect size in the context of equity models. Finally, to check for possible interactions of gender and ethnicity that might be overlooked when considering each factor individually, we use a linear regression model: Here, Post and Pre represent the overall percent favorable scores, Gender is coded as F or M [32], and EthRep is coded SR or SUR for statistically represented or underrepresented ethnic groups, respectively.
Fitting this model to the sample of 264 students, we find that only the coefficient for Pre is significant: β Pre ¼ 0.57, 95%CI ¼ ð0.46; 0.67Þ, p < 0.01. For the full model, R 2 ¼ 0.32, indicating that substantial variance remains unexplained. Neither gender nor ethnic representation, or the interaction between them, were significant predictors of post-course expertlike beliefs once a student's precourse beliefs were accounted for. This result supports the nonsignificant effect sizes found above, and clarifies that there is no detectable gender-ethnicity interaction that was hidden by splitting the data along those categories.

V. DISCUSSION AND CONCLUSIONS
Previous studies of student conceptual gains in introductory physics have pointed to a disparity in scores between male and female students [13,17]. Results vary on whether these gaps persist in reform-based classes, where various features of the learning environment might be expected to support traditionally marginalized students. Although there is important debate about the degree to which gap gazing is useful or appropriate in education research, a gender or ethnicity-divided difference in gains is troubling because it suggests that not all students are receiving the claimed benefits of reform efforts.
In the attitudinal study reported here, the picture is somewhat different than for conceptual measures. Returning to our research questions: 1. To what extent does gender or ethnic representation influence students' percentage of expertlike CLASS responses in University Modeling Instruction? There is no evidence that female students, or those from statistically underrepresented ethnicities, have either lower or higher precourse, postcourse, or shifts in percentage of favorable beliefs on the CLASS. Closer examination of the score distributions does show some evidence of a ceiling effect on the postcourse CLASS, but it does not appear that the effect is stronger for men or students from SR ethnicities (which, if it had been the case, might artificially suppress a gap). It would be very useful to disaggregate the scores by gender and ethnicity for a broader sample of classes, where high pretest scores are less prevalent, and for non-modeling courses (more on this below).
2. To what extent is there an interaction between gender and ethnic representation? In a linear model of postcourse attitude scores where gender, ethnicity, and their interaction are included, none of these coefficients are statistically significant. Only students' precourse attitudes are a significant predictor in the model, and even with this inclusion the model only accounts for 32% of the total variance in postcourse attitude scores. So far as we can detect with these data women from statistically underrepresented ethnicities have a similar pre-and postcourse attitude profile as their peers in other groups.
Revisiting the two models of equity discussed in Section II, the modeling classes are supportive of student attitudes in the Equity of Fairness sense, where all groups show similar gains. No precourse differences in distribution were detected, nor did traditional majority groups show disproportionate gains, so Modeling Instruction is also supportive of student attitudes by the Equity of Parity model. As noted above, one possible explanation for FCI gender gaps is stereotype threat, which is known to depress the performance of women and students from underrepresented ethnic groups on many academic tasks. However, a key component of the threat is perceived risk of doing badly on a task where one will be judged. Three features of the current study-some of which may be idiosyncratic to the FIU context-are worth discussing as pertaining to stereotype threat. First, there may be some mediating effect on stereotype threat when students who typically are impacted by stereotype threat, such as statistically underrepresented groups, are in the majority. This explanation fails to account for the lack of precourse gender differences in our sample, and in this study we do not make such a claim. A second point is that, even when statistically underrepresented students are in the majority-as is the case at FIU-the instructor still holds a position of power whereby they can make evaluative decisions. However, an attitudinal survey, where students are asked to rate their beliefs rather than to choose one correct answer, may be perceived as a less failure-prone task and thus not trigger the threat. Finally, the precourse attitudes for students entering the modeling classes are already very favorable, above lecture students at the same institution, and at the high end of typical precourse scores reported for the CLASS (Adams et al. [23], Tables V and VIII). Possible explanations include greater student buy-in at the beginning of the semester, as students must apply and be selected by lottery due to the popularity of the UMI sections. A related hypothesis is that the same informal network of peers that passes information about the course may also confer a higher expectation of success, leading to a self-efficacy boost that registers on the CLASS. To help account for the first possibility, a more comprehensive attitude survey of lecture students in the same cohort would be useful: tracking students who unsuccessfully applied to modeling sections, and comparing their CLASS precourse scores with those who found seats in modeling, could detect whether the UMI classes somehow attract more "physics people." Observationally, however, this is somewhat unlikely, as many UMI students are on premedical paths and have no initial interest in a physics career.
Returning to the question of gap gazing, looking for performance differences between groups should be done carefully, because it risks problematizing already marginalized students. But until the field of physics accurately reflects the diverse talents of the population, and until effects such as stereotype threat are no longer detectable, it is important for education researchers to address whether their reforms truly are for all students. Building on this awareness, a constructive way to address the problems of underrepresentation in science is to examine successful curricula and learning environments so that lessons may be drawn from positive examples.
In this work, we have examined the favorable attitudinal shifts reported in UMI courses, asking whether they are equitable among students of different genders and ethnicities. We find that they are, and somewhat surprisingly, that this is true even on a precourse attitude survey where more negative attitudes have been reported for women in other studies. While it would be unreasonable to attribute this precourse parity to the UMI curriculum, it suggests that a fruitful dimension for research to expand is beyond the boundaries of classroom pre-and post-tests, to investigate the learning networks and communities that may transmit information and expectations to future students. The results reported here, taken together with previous FCI and odds of success comparisons for the same courses [17], also caution against taking any one test score-attitudinal, conceptual, or otherwise-as the solitary measure of a student group. Multiple measures of success are needed to understand, measure, and value the many things that students learn in physics courses.