Beneath the numbers : A review of gender disparities in undergraduate education across science , technology , engineering , and math disciplines

[This paper is part of the Focused Collection on Gender in Physics.] This focused collection explores inequalities in the experiences of women in physics. Yet, it is important for researchers to also be aware of and draw insights from common patterns in the experiences of women across science, technology, engineering and mathematics (STEM) disciplines. Here, we review studies on gender disparities across college STEM on measures that have been correlated with retention. These include disparities in academic performance, engagement, self-efficacy, belonging, and identity. We argue that observable factors such as persistence, performance, and engagement can inform researchers about what populations are disadvantaged in a STEM classroom or program, but we need to measure underlying mechanisms to understand how these inequalities arise. We present a framework that helps connect larger sociocultural factors, including stereotypes and gendered socialization, to student affect and observable behaviors in STEM contexts. We highlight four mechanisms that demonstrate how sociocultural factors could impact women in STEM classrooms and majors. We end with a set of recommendations for how we can more holistically evaluate the experiences of women in STEM to help mitigate the underlying inequities instead of applying a quick fix.


I. INTRODUCTION
Throughout this Focused Collection, researchers are discussing the challenges facing women 1 in physics in the hope of generating productive conversations and interventions that change these experiences.These conversations can be strengthened by extending their scope beyond physics to include findings from other science, technology, engineering and mathematics (STEM) disciplines.Extensive work on the experiences of women has been done across STEM disciplines and several common patterns of inequalities have emerged.In many STEM disciplines, women are underrepresented at the undergraduate level and this underrepresentation is often exacerbated at the graduate, postdoctoral, tenure-track faculty, and practicing STEM professional level [5,6].Even if women persist to become STEM faculty and working professionals, these women tend to have lower visibility in their fields than their male colleagues: they participate less than males at academic conferences, particularly in more prestigious settings [7,8], and are less likely than males to hold the more prestigious first or last author position on manuscripts [9].These broad findings illustrate that many issues facing women in physics transcend disciplinary boundaries, making the experiences of women across STEM fields highly relevant to the experiences of women in physics.
Increasing the retention of women in undergraduate STEM disciplines may be a first step in the complex and daunting task of improving the representation of women as STEM professionals.Currently, women switch out of undergraduate STEM majors at a higher rate than their male peers [6,10] and this may be especially true for female underrepresented minority students [11].These women are not leaving college, but moving to other non-STEM majors, indicating that there is something about Published by the American Physical Society under the terms of the Creative Commons Attribution 3.0 License.Further distribution of this work must maintain attribution to the author(s) and the published article's title, journal citation, and DOI. 1 It is important to acknowledge that although we focus on males and females in this article, gender identity exists on a spectrum and that more than two genders exist in the human experience.In addition, even individuals that identify as women do not all share a common experience.Gender is a complicated identity based on individuals' internal experiences of who they are.Thus, individuals can vary in the degree to which they identify with a particular gender, how important gender is to their identity, the gender roles associated with their gender, and how their gender identity influences their experience in different settings such as a classroom [1][2][3].In addition, gender is only one of a multitude of social identities that make up who we are and how we react in certain settings.Other identities, such as race and/or ethnicity, also influence a students' experiences of their gender in the classroom [4].college-level STEM that is driving them away.Even women who persist through their undergraduate majors enter graduate school at lower rates than their male peers in 75% of STEM disciplines [12].If we can prevent undergraduate STEM programs from being a selective filter favoring males [13], we may be able to increase the number of women at the professional level in STEM fields.
However, we argue that only focusing on the numbers of women in STEM may be shortsighted as college retention data alone underestimates and potentially masks gender disparities that exist across STEM.These more subtle gender disparities could impact students in ways that influence the long-term retention of women in graduate school, postdoctoral training, and academic positions.Further, retention numbers alone do not provide any insights into the underlying mechanisms that contribute to this drop-off in numbers of women in STEM.
We propose that we need to look deeply into the underlying factors that influence retention across STEM disciplines in addition to physics in search of commonalities and potential solutions.Therefore, we critically reviewed the published literature to identify gender disparities in multiple factors shown to influence retention in STEM.Although our review is not comprehensive, these findings give us insights into the mechanisms underlying retention disparities.Further, we present a framework that STEM instructors and researchers could use to critically examine relationships among the different types of gender inequalities that may exist in their classrooms.Through this framework, we describe a set of possible ultimate causes of these gender inequalities that are situated in theories from social psychology.Targeting these deep underlying causes could be a more effective strategy for mitigating gender disparities in college physics classrooms.
Critical evaluation of the current literature on gender disparities in college STEM also highlights the pressing need for additional research.It is our hope that this article can provide a deeper conceptual understanding of the complexities facing females in STEM and stimulate a robust dialogue among researchers and educators across disciplines on ways to improve the experience of female students in college STEM classrooms.Furthermore, we urge discipline-based education researchers to move beyond the natural sciences to begin conversations and collaborations with social psychologists and sociologists to better study some of these underlying mechanisms.We have much to learn and share with each other.

II. GENDER INEQUALITIES IN COLLEGE STEM BEYOND RETENTION
The number of students enrolled in a college STEM major or course is only a superficial diagnosis of gender inequalities in that discipline or course.While low numerical representation is indicative of a problem, the lack of a numerical gender gap does not necessarily indicate an absence of gender inequalities.For example, in biology, where 60% of undergraduate students on average are women, simply examining the numerical representation of female students may lead to the assumption that there are no gender inequalities.Yet, we have observed gender disparities in achievement, participation, and comfort in multiple biology classrooms (see Eddy, Brownell, and Wenderoth [14]).Additionally, the numerical underrepresentation of a group of students does not tell us anything about why there are so few students.In this article, we focus on possible underlying factors such as academic achievement, engagement, and affective measures.While we are not claiming that these are the only factors that matter, we present them as examples of growing evidence for the types of gender inequalities that extend beyond coarse-grained enrollment numbers.For each factor, we also comment on aspects of data collection, analysis, and interpretation that may impact the findings.
Here, we only included papers that explored these factors in the context of the undergraduate STEM experience to document gender gaps specific to this population.We did not include studies of gender gaps in K-12 or in other college settings (e.g., studies done with nonmajors outside of STEM classrooms).Although there have been many studies on math and science attitudes of undergraduate women not majoring in STEM, we chose not to include these studies because there is evidence that many women in STEM differ in critical ways from women not in STEM.For example, many STEM women hold different beliefs about STEM related abilities and gender than non-STEM women [15]; and there is some evidence that STEM women tend to be less gender normative in terms of gender roles [16][17][18].For example, a study of computer science majors found that female majors did not differ from male majors on a sex roles scale, but did significantly differ from female nonmajors [16].Thus, to be conservative, we only included studies done with STEM majors or in STEM classrooms.
The papers collected here constitute a representative, although not comprehensive, sample of work in the area of gender inequalities in STEM.A challenge with a synthesis of gender inequalities is that while some studies report gender gaps in undergraduate STEM as the main finding, other papers include this information as a supplemental section in a course or program evaluation paper.This makes it difficult to collate the findings on gender inequalities because many papers do not even use gender as a key word and thus are not archived by search engines in this way.We have done our best to find as many papers as possible, but want to acknowledge that we likely have not captured everything that has been published.

A. Academic performance
A common way to measure the academic climate for women in college STEM courses is to compare the academic performance of females to that of males.
Academic performance across STEM is a predictor of retention [19,20], so a consistent achievement gap between students could be an important factor in why more women than men leave STEM.However, one study comparing women who leave STEM to women who remain in STEM did not find a significant difference in academic performance between these two populations [21].This finding suggests that although academic performance is important for the retention for all students, it may not, on its own, explain the gender differences in persistence in STEM.
In the studies that we reviewed from a range of STEM disciplines including engineering, biology, chemistry, physics and math, we do not see a consistent gender gap in performance across or within disciplines (Table I).There is also no consistent pattern comparing lower division to upper division courses.Overall there may be a trend toward men outperforming women, but these results are mixed.A closer examination of the statistical analyses and outcome variables used in these studies suggests a more nuanced pattern of gender inequalities and a need for more research in this area.
The majority of studies examining differences in academic performance are not randomized trials.Instead, they are observational studies of students who have self-selected into classrooms.In these quasirandom study systems, it is critical for researchers to control for differences between students.Lack of random assignment of groups to a treatment can produce systematic biases that confound study results [39,40].For example, an important systematic bias exists in the population of males and females who attend college: college-bound women systematically have higher high school GPAs than men that attend college [41].This implies that, all else being equal, females ought to do better in their college classes than males.Thus, even if males and females appear as though they are performing equally in a course, an achievement gap may actually be observed if students were matched by an indicator of ability (such as prior performance in high school or college courses).
In the majority of studies that controlled for a measure of student ability, males seem to outperform female students (Table I).The only achievement gap that favored females after controlling for student ability was found in a physics class for nonmajors [27].In the studies that do not control for ability, there is either no achievement gap or one in favor of females (Table I).This pattern holds true even within a discipline: in biology, the one study of introductory biology classes that did not use a control for student ability did not find an achievement gap [24], but our own work in introductory biology courses that controlled for student performance in their prior college classes found an achievement gap that favored males [14].
In addition to controlling for academic differences between students, the outcome variable that is measured can influence whether or not researchers observe an achievement gap.For example, Kost-Smith, Pollock, and Finkelstein [42] observed an achievement gap in physics classrooms when they used exam performance as their outcome variable, but not when they used overall class performance because women tended to do better than men on homework assignments.Even in studies that only use exams as their outcome variable, the characteristics of exam items used can influence the presence and size of observed gender gaps on those exams (Wright et al. [43], McCullough [28]).In addition, how researchers choose to measure learning can impact achievement gaps.For example, Willoughby and Metz [44] tested multiple ways of measuring learning gains and found achievement gaps between males and females when using normalized gains (post-test minus pretest divided by pretest), but not other gain calculations.Thus, researchers should carefully consider what outcome variable is most informative to their research question.For example, course grade can be useful for determining who can continue in the major if there are grade cutoffs, but exam or concept inventory performance may be a better indicator of actual student learning.
Finally, while academic performance gives an account of how a student is achieving, this is still a relatively coarsegrain approach that does not provide insights into why differences in performance might exist.The gap in achievement could stem from a number of different factors and achievement gaps in different settings may have nonoverlapping causes.In addition, the lack of an achievement gap may lead to incorrect assumptions that gender equality has been achieved when in reality other unexamined gender disparities remain that could impact long-term retention in STEM.

B. Engagement
While less well studied than academic performance, there is also evidence for gender inequalities in engagement in college STEM (Table I).A student's engagement in STEM can include participation in the classroom (e.g., answering an instructor's question or talking in small groups), as well as participation in activities that are STEM related outside of the classroom (e.g., study groups, clubs, research).Participation and engagement within one's discipline has been shown to predict retention [45,46].In addition, participation can be an important indicator of other affective measures such as a student's perception of and anxiety about performance in a STEM course [35,47].In this article, we focus on participation in the classroom setting because most studies have been done in this context.
Although there is a rich literature exploring participation differences based on gender in non-STEM college classrooms [48][49][50], the evidence for gendered patterns of inclass participation in STEM classrooms is sparse (Table II).Studies using student self-reported participation in college STEM classrooms have shown that female students across disciplines report lower participation or less comfort with participation when compared with males [35,36].This appears to be true even in STEM disciplines with large numbers of females: we found self-reported participation differences between males and females in multiple large introductory biology classes [38].In this study, men felt more comfortable answering instructor questions in front of the whole class and were more likely to prefer to take on the role of leader in small group work compared to women.However, self-report is a measure of the subjective experiences of students, it does not necessarily always correlate with observed differences.Thus, it is important for studies to also directly measure the participation in STEM classrooms.
Fewer studies have observed participation in college STEM classrooms, although the studies that do exist are large in scope, span the range of common class sizes, and include lower and upper division courses.Sternglanz and Lyberger-Fick [37] is the earliest study that we found on student participation in STEM college classrooms.The researchers in this study observed 16 "natural science" courses that had predominately male enrollment and were taught by male instructors.They found that male undergraduates were more likely than female undergraduates to answer instructor questions and initiate interactions with the instructor in class.Society and gender roles have changed since this study was published in 1977, yet even in more modern studies we find similar patterns of gendered engagement.A more recent study explored student engagement in classroom discussions in 26 lower and upper division humanities and science (biology and physics) courses at one institution [34].These courses were all generally small in size relative to most science classes (the average class size for their "large" classes was 35 students) and, unfortunately, researchers did not disaggregate the science and humanities courses in their analysis.Across both classroom types, they found that gender inequalities in participation grew over time in college, with males speaking up more as they progressed from introductory to upper-division courses.To our knowledge, the most recent observational study of gender gaps was work by us in 26 introductory biology classrooms at an R1 university [14].The class sizes in this study ranged from 159 to approximately 500 students.Although small group work occurred frequently in many of these classrooms, the study focused only on student responses to instructor posed questions to the whole class.Even though the class composition was 60% female, female voices were heard answering questions only 40% of the time.A similar pattern was seen for students asking questions of the instructor in class, although there was more variability in this among classes so the effect was not significant.Taken together these three studies suggest a gender gap in participation that consistently favored males in STEM classrooms.Further work across more classrooms If 2 patterns are listed, the first is the predominate one and the second is less common.® These studies did not control for differences between students.
and STEM disciplines will be necessary to assess the robustness of this pattern.
A major limitation of these observational studies is that it is impossible to determine the cause of the gender difference in participation from observation alone.Class participation in these studies involves two people: the instructor and the student.Thus, gender disparities could be due to instructors not calling on females as much as males (as has been seen in some K-12 classrooms [70,71]), female students not feeling confident enough to raise their hands to answer a question [72], or a combination of both.These are two different causes of participation differences and would necessitate different interventions.More sophisticated observational studies that go beyond simple counts of who is answering questions could better disentangle these influences.For example, documenting who is raising their hands, who shouts out answers, and who is called on could distinguish between student and instructor contributions to this participation bias.Researchers could also track individual students to determine if the gendered pattern is a result of high participation by a few individuals or is more evenly distributed among the majority of students in the class.Finally, the quality of both instructor questions and student answers could be assessed.It has been demonstrated in K-12 literature that instructors hold different expectations for some populations relative to others [73] and this may impact who they call on to answer a certain type of question.
A second limitation of these studies is that they only focus on one context of classroom participation: participation in front of the whole class.As more classrooms move to student-centered instruction [74], students are increasingly working with each other in pairs or small groups.Gendered patterns of participation and the quality of this participation in these contexts need to be explored as well.Chi and Wylie's [75] Interactive, Constructive, Active, Passive (ICAP) framework is one way that researchers could conceptualize the quality of student interactions in small groups and explore whether gender influences student contributions.In this framework, interactive participation (where students take turns talking and build on each other's ideas) is hypothesized to lead to greater student learning than constructive participation (where one student in the group does all the talking).Monitoring participation in small group work is challenging for instructors, particularly in large classes where it is nearly impossible for them to engage with every group.However, these contexts need to be explored to fully characterize gender inequalities in participation that may be meaningful for retention in STEM.

C. Affective measures: self-efficacy, belonging, and science identity
Undergraduate students hold beliefs about their own abilities in STEM, their peers' abilities, and about STEM disciplines in general, all of which can influence how students perceive their relationship to their major.These attitudes and experiences can be important predictors of retention in STEM.For example, a survey across two cohorts of STEM majors at one university found that students' perceptions of their own fit within a STEM discipline influenced the retention of women in the major, especially in male dominated fields [76].Below we outline three affective measures that are related to STEM retention: self-efficacy, belonging, and disciplinary identity.These measures are not inclusive of all the possible affective measures impacting retention, but they are the ones that have been most studied in STEM classrooms.The amount of research that has been done on each measure in STEM settings varies greatly and conclusions that can be drawn are preliminary.However, themes are beginning to emerge (Table II).

Self-efficacy
Self-efficacy is a measure of the strength of one's belief that one can complete a task or goal.There are many types of self-efficacy, but the studies in this review focus on academic self-efficacy: one's confidence in one's ability to master academic subjects and coursework [77].A metaanalysis of 38 studies relating self-efficacy to achievement and 18 studies relating self-efficacy to persistence in academic settings found that self-efficacy significantly impacted both measures [78].This effect remained even when researchers restricted the sample to the subset of studies done in a college context.Thus, identifying differences in self-efficacy between males and females in STEM classrooms has the potential to help explain differences in retention.
We found eight studies that documented self-efficacy at the beginning of a student's college career.Five out of the eight studies focused on engineering and demonstrated that male first year undergraduates had higher self-efficacy than females.Besterfield-Sacre and colleagues' [57] study spanning 15 institutions was the most robust.It showed that incoming male engineering majors rated their confidence in their basic engineering skills higher than female engineering majors across all institutions (although this difference was not significant at three institutions).This pattern of higher confidence for men remained even after a year of coursework in engineering, although the pattern was weaker: gender differences in self-efficacy were only present at seven of the institutions.In studies at single institutions, Felder et al. [29] and Jagacinski [58] both found patterns of lower academic self-efficacy for first year female engineering majors relative to male engineering majors.Concannon and Barrow [59], on the other hand, found no difference between male and female engineering majors in some measures of self-efficacy, but did find gender differences in self-efficacy in engineering specific abilities.Studies on academic self-efficacy with first year students in other STEM disciplines have also shown that men hold higher self-efficacy than women.In a study that tracked self-efficacy in a first year chemistry course for mixedscience majors at a single institution, researchers found that the gender gap in self-efficacy was mediated by a race by gender interaction: no gap was present between male and female students who were black or white, but there was a significant gap between Asian American and Latina(o) males and females, which decreased over the term [61].This study is of note as it was the only study in our sample that included prior academic achievement as a control in their model when predicting self-efficacy.Thus, the gaps in this study were between males and females of matched ability in terms of their performance on the math SAT.It is also the only study in our sample that explored the interaction between gender and race or ethnicity.In contrast to this study, Dalgety and Coll [60] found that the gender gap in self-efficacy in a chemistry class for majors actually increased over the term.Finally, a study with nonmajors in a college algebra courses across 10 institutions found that males had higher self-efficacy than females [33].
Studies on advanced undergraduates (third and fourth years) across STEM reveal that the academic self-efficacy of men remains higher than that of women even beyond the first year of college.A study of third year students across STEM disciplines who were all enrolled in a supplemental program found that female STEM majors reported significantly lower academic self-efficacy compared with male STEM majors, particularly on study skills, test-taking skills, and coping with test anxiety [52].A study across 49 different upper division STEM courses found that women reported lower confidence in their academic ability in courses in their discipline compared to men [51].Finally, in a study comparing computer science (CS) majors to nonmajors, female CS majors reported lower self-efficacy than male majors and had lower self-efficacy than even male non-CS majors on a measure of self-efficacy related to working with computers [53].
Another type of self-efficacy that likely impacts retention is the belief that one can excel at a job in one's chosen field.Two studies have examined this type of self-efficacy in engineering.Cech et al. [56] measured students' selfefficacy at four different institutions during their first year of college and then four years later to determine whether they persisted in an engineering major.They found that a student's belief that he or she could excel in an engineering career significantly predicted retention in engineering, and female engineering majors had lower confidence in this ability to excel than males.The second study, conducted at one institution, also found that male engineering majors had higher career self-efficacy than female majors [59].
Each of the studies described above only captures a snapshot of self-efficacy and not how an individual student's self-efficacy changes over the duration of their STEM major.It is plausible that the gender gaps in selfefficacy seen in advanced undergraduates are actually smaller than they were initially, but cross-sectional and longitudinal studies are needed to address this question.However, if students are not individually followed through time in longitudinal studies, it is still not possible to disentangle two explanations for the observed changes in self-efficacy: (i) a student's self-efficacy increases over time or (ii) students with lower self-efficacy dropped out of the major, so their lower self-efficacy scores are not averaged in the advanced undergraduate group.We did find one study that followed almost 2000 students longitudinally as they advanced through the college engineering sequence [54].These researchers showed that the gap in self-efficacy between female and male engineering majors actually widened as students advance through their major [54].However, they did not individually track students in their analysis, which could be done by accounting for repeated measures, limiting the interpretation of these results.
In summary, these studies suggest that women in STEM report lower self-efficacy beliefs than men at the beginning of their college career, but more longitudinal studies are needed to further probe this area.

Belonging
Belonging is the experience of feeling accepted as a member of a group.Sense of belonging can be in reference to multiple different groups; one can belong to a discipline as a whole (e.g., belong in science), belong in a major (e.g., physics major), belonging in the classroom or other communities (e.g., Physics 101 class), or even belong to a small working group in the context of the larger class (e.g., one's lab group).
The connection between belonging and retention has been demonstrated in two studies using very different methods.In one study, female undergraduate students in a college level calculus class who reported a higher sense of belonging to math were more likely to express an intent to pursue further math courses beyond the calculus series [79].The second study used an observational approach to measure belonging [80].The researchers postulated that the amount of time that faculty in STEM departments spent socializing with colleagues was a proxy for their sense of belonging and that this time would predict how engaged a faculty member felt in his or her job (the measure of intent to persist in this study).Forty-five faculty members wore recording devices and researchers found that women who engaged in fewer social conversations with their male colleagues felt more disengaged from their job [80].Fewer social conversations seemed to have no impact on men.Thus, there some evidence that sense of belonging has the possibility to influence persistence in STEM.
There have been few studies on belonging in a discipline done with students in STEM classrooms and the results of these studies are mixed (Table II).Stout et al. [65] documented belonging in five calculus-based physics classrooms at one institution and found that male students had a stronger sense of belonging to physics than female students.A second study of college level calculus classes at a another institution found that a sense of belonging to math for all students declined over the semester and the belonging of female undergraduates was significantly lower than male undergraduates in the middle of the course [79].
We found three additional studies that examined sense of belonging of STEM students in contexts other than their discipline.One study experimentally manipulated the gender ratio of participants at a hypothetical STEM conference and found that female STEM majors felt a lower sense of belonging at that conference when the gender ratio favored men [62].This study implies that women may feel a lower sense of belonging in STEM environments where they perceive that they are in the minority.A second study focused on the sense of belonging at an institution for STEM transfer students [64].Female STEM transfer students experienced a greater difference between their sense of belonging and their desired level of belonging at this institution than did male STEM transfer students.Another study of almost 150 first year graduate students across STEM disciplines found that at the start of their first year, female students reported a higher sense of belonging in their program than their male peers [63].Although at first glance this may seem like a positive, it actually suggests that only women with a heightened sense of belonging consider graduate school whereas men with both higher and lower senses of belonging pursue graduate school.Thus, belonging may be a limiting factor for women in pursuing STEM careers beyond their bachelor's degree.

Disciplinary identity
Disciplinary identity is an indicator of the extent to which students perceive their own identity to be aligned with the identity of practitioners in their discipline.The relationship between this disciplinary identity and retention has been implied by one study in STEM that spanned three populations: advanced undergraduates, graduate students, and post-doctoral scholars [81].Within each of these populations, individuals with greater science identity also expressed greater commitment to becoming a research scientist.
Only a few studies of disciplinary identity have been done with STEM students.In general, these studies indicate that there is a relationship between gender and the strength of one's STEM identity (Table II).In a sample of incoming first year students across 40 institutions, students on average reported neutrally to three questions asking 'do you consider yourself a [biology, chemistry, or physics] person', but this response was moderated by gender and race for students intending to pursue a STEM career after college [66].For these STEM-oriented students, white women identified more with biology than white men, while men and women of color did not differ from each other in biology identity; men and women of all races and ethnicities identified equally with chemistry; and white men had a stronger identification with physics then white women, but men and women of color did not differ from one another in physics.One point of caution in the interpretation of this study's findings is that the same students answered each of the three questions, yet the questions were analyzed as if they were functioning independently.It is possible that a student's response on one question might have influenced his or her response on another (i.e., students may have thought they could only identify with one of the disciplines represented or the order in which the questions were presented could influence their responses on subsequent questions).
A second large study of students in enrolled in introductory courses for majors in biology, chemistry, and physics at one institution revealed that, on average, science identity between male and female students is equal, but this effect is moderated by how strongly one associates with one's gender [67].They found that female STEM students who perceive gender as more important to their personal identity had weaker science identity.The salience of gender did not impact the science identity of male STEM students.
The results for advanced undergraduates are also mixed.In a study with primarily third year students in engineering and chemistry classes at one institution, male students had a stronger STEM identity than female students [68].However, a second study using a national sample did not find a significant difference between the STEM identities of male and female students [69].The students in this second study were recruited from the listserv of a national research meeting, though, so they may already have gone through a selective filter before the study because undergraduates with low science identity are unlikely to attend such a meeting.

Limitations of affective measures
One of the largest limitations of these studies is that the majority of them only sampled students once.Without following students through time, it is impossible to determine the impact of a degree program or course.It is interesting that the one study in the sample that did follow students longitudinally through their major revealed that the gap in the affective measure between male and female students grew over time [54].This implies that there was something about the college experience that is disadvantaging female students and could potentially lead to lower retention.
In addition, it is important to think carefully about how one measures these affective factors.While Likert-scale surveys were used to assess self-efficacy, belonging, and science identity in the studies described above, it is important for researchers to be critical of the validity of surveys prior to administering them.While well-designed published surveys are validated in a particular context, questions are only truly valid for the study population that they were tested on, and it is important for researchers interested in administering a previously designed survey to determine if the questions are valid in their specific context (e.g., questions designed for graduate students may be misinterpreted by undergraduates; see Benson [82] and Netemeyer, Bearden, and Sharma [83]).Another way of investigating these affective measures is through interviews or focus groups, but these qualitative methods often take more time to both collect and analyze data and the smaller sample sizes may introduce more biases in interpretations.As such, affective measures are sensitive to the specific questions and context in which the data was collected and studies should be evaluated with these criteria in mind.

D. Conclusion: Gender gaps in STEM
Overall, there is evidence for gender gaps in multiple factors across a diverse range of college STEM disciplines, classrooms, and programs.Although the gender inequalities presented above are not meant to be comprehensive and clearly more work needs to be done before we can draw any definitive conclusions, preliminary patterns are emerging.Although there is not a strong pattern for academic performance and too few studies on participations to make any broad conclusions, there do seem to be consistent gendered patterns in affective measures.In the studies collected here, women enter college STEM expressing lower self-efficacy and science identity than men and as they advance in their college career, the gaps do not close.The consistencies in affective measures as compared to performance may be due to the fact that academic performance is measured in many diverse ways (e.g., post-test, exam performance, concept inventories) and is not as well defined a construct as some of the affective measures.Another possibility is that students are so motivated to achieve that they are able to overcome differences in affective measures to perform equally [84].
Our second conclusion is a call for a shift towards more systematic data collection.Currently data collection in STEM is done in a somewhat piecemeal way and many of the studies in this review report on only one type of data.More studies that collect multiple measures, longitudinally on the same sample of students, would provide us a more nuanced picture of student experiences in our majors and classrooms.

III. UNDERLYING SOURCES OF GENDER INEQUALITIES IN COLLEGE STEM
Studies that have collected multiple measures on the same students reveal that affective measures such as selfefficacy, science identity, and belonging tend to be correlated with each other and change over time, even within one semester (Good et al. [79]).Similarly in multiple studies, these measures have been shown to be correlated with achievement and with other affective measures not mentioned above (e.g., values orientation, goal orientation, and many more Hernandez et al. [85] and Perez, Cromley, and Kaplan [86]).These correlations imply that there are possibly underlying processes that influence all of these measures together.In the next section, we adapt a motivational model for career interest developed by Wang and Degol [87] to help illustrate these relationships and the proposed underlying factors that could unite them.

A. Adapted framework that incorporates observed factors, psychological factors, and underlying sociocultural factors
Wang and Degol's framework explores how motivation influences career choices.It draws on studies of student experiences in K-12 and connects three levels of influence on student career choices (Fig. 1).This framework seems to be supported by the college level literature reported here, although the evidence from college STEM is more tentative as there have been fewer studies.Students in most STEM classrooms have already made their initial career choice by choosing their major.Thus, in our adaption of this model we replace career choice with the decision to persist in STEM.
In Wang and Degol's framework, the observable factors related to career choice are student behaviors such as engagement, course enrollment, and achievement in the domain of one's intended career.These factors align with our factors of performance and engagement in STEM classrooms.
The next level of Wang and Degol's framework includes prior demonstrated ability and psychological factors that can affect persistence directly and/or indirectly through the observable factors.These psychological factors include the invisible factors of self-efficacy, belonging, and science identity that we described earlier, as well as many others.The idea behind the connection between these psychological factors and the observable factors is that students will be more motivated to achieve, engage, and persist if they feel as though they can be successful and if the value they see in their major is worth the cost of remaining in it (Wang et al. [87]).Self-efficacy is one measure of their belief that they can be successful in STEM; science identity and belonging are two measures of the value they see in their STEM major.
Finally, Wang and Degol propose that underlying sociocultural factors influence these psychological factors.These sociocultural factors are larger cultural norms and values that students and instructors bring into the classroom.They influence students in many ways including their ability to see themselves as scientists (science identity), how other students treat them (which can influence belonging), and their beliefs that they can be successful in STEM (self-efficacy).These sociocultural factors may be the underlying sources of many of the gender inequalities documented in college STEM.
We make two additional modifications to Wang and Degol's model to make it better fit the experiences of women as they progress through their STEM majors.First, Wang and Degol focus explicitly on forces that motivate students, but it is also important for us to consider that there are events outside of a student's control that can impact the observable factors.An example of such an event might be if a female student raises her hand in class, but the instructor does not call on her and calls on a male instead.In this case, the student has not achieved the outcome of talking in class because an outside factor (the instructor) has obstructed her.To account for these external factors, we added an arrow from the underlying sociocultural factors to the observable outcomes.In addition, the feedback that students receive on their performance or engagement as they progress through an undergraduate degree will influence their subsequent perceptions of self-efficacy, belonging, and identity.To illustrate this longitudinal feedback loop, we added an arrow back from the observable factors towards the underlying factors.We propose this model as an approximation of the interactions among factors that may influence a college STEM student's retention.
We hope instructors and researchers can use this framework to begin to conceptualize gender gaps in their classrooms.While the observable factors of persistence, performance, and engagement can only provide information about who is affected, the psychological factors and sociocultural factors can both begin to explain how some of these observed gender inequalities may have come to exist.This also means that interventions to ameliorate gender gaps in persistence, performance, and/or engagement may only be a metaphorical band-aid to a much deeper problem.If significant changes to the experiences of women in STEM are to be made, this framework posits that the mechanisms surrounding sociocultural factors in STEM classrooms will need to be targeted.

B. Sociocultural factors: Stereotypes and gendered socialization
In this section, we explore four examples of how sociocultural factors could be impacting the persistence of female undergraduates in STEM (Table III).We focus on the impact of stereotypes and gendered socialization and the mechanisms by which they may act in the classroom.We then highlight the evidence that these mechanisms could explain why we see gender gaps in psychological and observable factors.
Although we initially introduce and provide evidence of each sociocultural factor broadly, we only feature articles that assess the impact of these factors on the three populations that are most likely to be relevant to women in STEM: (a) Female undergraduates enrolled in a STEM major; (b) female undergraduates in a STEM course who may or may not be majors; (c) females who identify positively with math.As most STEM disciplines require at least some degree of comfort with math, we felt that this third population was appropriate to include.

Stereotypes about women's ability in STEM can lead to stereotype threat
Few people in society today explicitly state that they believe that men are better at science or math than women, but more subtle versions of this message persist.For example, in television programs popularly watched by middle school aged students, male scientists appear 1.6 times as frequently as female scientists.This disparity in representation is even greater in shows not funded by the National Science Foundation: less than half the scientists portrayed are women [88].The lack of female scientists portrayed in the media may send the message that science and math are activities for males.Messages like these have impact: in a worldwide comparative study, countries where people hold weaker associations between science and maleness have reduced achievement gaps between men and women on national math and science assessment exams, whereas countries with stronger associations between science and maleness have larger gender gaps [89].The United States has an average position on the scale of the association between science and maleness, indicating that these stereotypes exist in our culture.
Even undergraduate women who enroll in STEM disciplines continue to be impacted by the stereotypes about women's ability in STEM: a sample of over 300 female STEM majors found these women still associated STEM with maleness [68] and in a smaller study approximately 25% of female STEM majors endorsed the stereotype that men are better at math [1].The negative impact of these associations for STEM women can be seen in work done by Nosek and Smyth [15]: college women, even those majoring in STEM courses, show a strong association between math and maleness and the strength of this association predicts their performance on the quantitative section of the SAT [11].Interestingly, college men enrolled in STEM courses show the strongest endorsement of the "men are better at math stereotype."Overall, it seems that the TABLE III.Evidence for the influence of the underlying processes linking sociocultural factors with the psychological and observable factors for women in STEM.
p indicates that there is evidence in the literature for a particular process to impact that factor.(the number in parentheses) indicates the number of peer reviewed articles supporting this connection.
p ? indicates the link has not been explicitly tested but is implied by published studies.
Please refer to text for actual citations.

Psychological factors Observable factors Underlying process
Belonging Self-Efficacy Science Identity Performance Persistence

Stereotype threat p ð3Þ p ð5Þ p ð4Þ p ð13Þ p ð7Þ
Biases held by others p ? p ?
Alignment of personal goals with perceived science goals Implicit theories of intelligence p ð1Þ p ð1Þ p ð2Þ stereotypes associating males with science and math influence the experience of women in college STEM.
Stereotype threat is often used to explain the negative influence that male-centric STEM stereotypes have on women in STEM.Stereotype threat is defined as situational pressure posed by the possibility that poor performance will be judged through the lens of a negative group-relevant stereotype [90].Some of the original studies characterizing stereotype threat were done with women who resemble STEM women: undergraduate women who liked math and for whom math performance was important to them [91].In one of these studies, female undergraduates and a matched set of male undergraduates were given an exam with Graduate Research Exam (GRE)-like math questions.Researchers manipulated how women might feel while taking these exams by using one of two verbal prompts: either "Males are known to outperform women on this exam" (stereotype threat treatment) or "Unlike other math exams, this exam does not show gender differences" (no stereotype threat treatment).Women in the no stereotype treatment performed almost three times better than the women in the stereotype threat treatment.Thus, women in STEM may be experiencing stereotype threat if they are concerned that their performance will be evaluated based on stereotypes about a woman's ability in these fields.
There is a strong body of literature suggesting that stereotype threat can impact both the retention and performance of women in STEM.Manipulations of the extent to which women experience stereotype threat have repeatedly been shown to modify women's self-reported interest in continuing in STEM [1,68,[92][93][94][95], although long-term studies of persistence are generally lacking (but see Beasley and Fischer [11]).Another robust body of literature demonstrates that stereotype threat decreases STEM women's performance on math-based exams in artificial experimental settings [91,[95][96][97][98][99][100][101][102].However, we should be somewhat cautious of these findings because only a few investigations have explored the impact of stereotype threat on women's performance on exams in their disciplines or in college STEM classrooms [30,103,104].
Stereotype threat impacts not only the performance and persistence of women in STEM, but also multiple psychological factors, although the number of studies that explore each factor is small.Many of these studies also demonstrate how decreases in these psychological factors are the proximate mechanisms by which stereotype threat decreases performance or retention.Studies have found that stereotype threat reduces science identity for women in a range of disciplines including biology, chemistry, physics, and engineering and some of these studies also show that this reduction in science identity mediates some of the impact of stereotype threat on persistence [68,93,95,105].Women under stereotype threat also seem to express reduced self-efficacy across diverse STEM disciplines and this impact on self-efficacy can also mediate some of the effect of stereotype threat on persistence [1,30,93,95,101].Finally, sense of belonging in STEM for undergraduate women, but not graduate women, seems to be impacted by stereotype threat and mediates some of the impact of stereotype threat on performance [63,65,79,105].This disparity between graduates and undergraduates could be because only women who have overcome stereotype threat persist to graduate school.Alternatively, stereotype threat may manifest in psychological factors other than belonging for graduate students.Thus, the mechanism of stereotype threat, which is derived from gendered stereotypes about who is good at math and science, could explain why we see gender gaps in multiple psychological factors and in observable factors in STEM majors.

Stereotypes about women's ability in STEM can lead to biased peers, mentors, or instructors
A second way that math or science-gender stereotypes can impact undergraduate women in STEM is through women's interactions with those around them who believe these stereotypes.A recent series of high profile papers reminds us that these biases appear to be fairly universal and held by people in STEM at all levels from peers to potential employers.In a study with millennial students who role played a hiring interview environment, people were more likely to hire men for a math-related job than women even when math performance was identical between the "candidates" [106].Additionally, a study of STEM departments at elite institutions demonstrated that male faculty members employed fewer female than male graduate students and postdoctoral researchers [107].Finally, a study across STEM indicated that faculty members held subtle gender biases when selecting a research assistant [108].When faculty members were asked to rate identical CVs assigned either a male or female name, both male and female faculty members ranked the male candidate as more hirable and competent and reported they would pay and mentor the male candidate more than the female candidate.Thus, biases held by others may limit the opportunities offered to women in STEM.
The actions of others can also lower a student's selfefficacy.One of the major sources of women's self-efficacy is recognition from others [109].There is evidence that suggests that women in STEM may not be gaining as much recognition as males.First in a study of three introductory biology courses, men were more likely to name men as knowledgeable about the class material, even after researchers controlled for students' actual performance in the class (Grunspan et al. [110]).In a second study with male engineers, men were more likely to discount ideas that were presented in a female voice compared to ideas presented in a male voice [111].These seemingly small events may also cumulatively lead to students feeling like they do not belong in an environment [112].For example, feeling supported by one's peers (an aspect of belonging) was the strongest contributor to student persistence in a study of CS majors [113].This impaired sense of belonging in STEM for women can predict their interest in continuing to pursue STEM careers [76,79].
Examples such as these demonstrate how psychological factors could be impacted by the biases people have about women in STEM.Promoting awareness of these subtle biases, especially among individual interacting with women in STEM, is another possible avenue for increasing the retention of women in STEM fields.

Conflict between personal goals and stereotypes about the practices of science
Just as students hold gender-STEM stereotypes, they also hold stereotypes about what it means to be a STEM practitioner.One of these messages is that STEM practitioners are socially isolated individuals working on esoteric problems.In two studies with a combined sample of almost 800 undergraduates from one institution, researchers found that students believe that individuals in STEM professions such as engineering, computer science, and environmental sciences are much less likely to work with others or contribute to society than individuals in non-STEM careers such as architecture or business [114,115].One potential problem with this stereotype is that women, on average, value working with others and contributing to society (from here on called communal goals) more than men value these goals [116,117].This difference in values is not innate in women and men, but is hypothesized, using social role theory, to be due to the historical emphasis on women being in more caregiving roles.This has led to the female gender role emphasizing more communal traits (reviewed in Diekman et al. [118]).
Thus, the stereotype of the socially isolated everyday activities and purpose of STEM is in conflict with the communal values that, on average, women hold more strongly than men.Evidence of this conflict comes from multiple studies.In a series of interviews with 10 graduate students in atmospheric science, women emphasized communal career goals of service and social impact significantly more than men [119].Additionally, in a national sample of over 9000 college students, women's interest in helping others was negatively correlated with their persistence in STEM [120].Finally, Smith et al. [93] found that women in STEM who perceived that science could not address communal goals had weaker science identities and this lower science identity predicted their attrition from STEM.Based on these studies, one possible way to address the gender gap in persistence in STEM may be to increase the perception that science can support communal goals.Evidence that this type of intervention could help female students comes from a study by Brown et al. [121] that used an intervention to experimentally manipulate female STEM majors into believing STEM careers could fulfill communal goals and found that this increased their interest in pursuing such a career.

Implict theories of intelligence and the stereotype of brilliant scientists
Another example of how gendered socialization could impact the persistence of women in STEM relates to intelligence.Women, on average, are more likely to hold an entity view of intelligence: that intelligence is fixed and at some point one reaches a threshold that cannot be exceeded.Men, on average, are more likely to hold a growth view of intelligence: that as long as you work hard, you can get better [122].This difference is hypothesized to be derived from differential treatment during childhood: girls, on average, are complemented based on innate ability rather than effort, whereas boys are complemented for the effort they put into tasks [122].The dangers of holding an entity view include giving up in response to failure and avoiding situations where one is at risk of revealing that one has reached one's limit [123,124].If science is a field that is seen to require high intelligence, than it may seem like a risky field for individuals with an entity mindset and they may choose to avoid it.Unfortunately, it does seem that many fields in science have the stereotype of requiring high innate ability [125].
Few studies in STEM have used implicit theories of intelligence to explore the experience of women.One study in college level math courses found that students had a more fixed view of math intelligence than general intelligence, but there were no gender differences in this perception [126].However, the students in this study were in a remedial algebra class and may not reflect students in most undergraduate STEM majors.A second study in a college calculus class found that women who held a fixed view of math ability also had a lower sense of belonging, decreased math performance, and decreased intent to persist in math [79].Finally, a study across STEM disciplines found that as a field is perceived to require more innate ability rather than hard work, the percent of female Ph.D.s is lower [125].These studies imply that changing an individual's view of intelligence and its relationship to STEM fields may be a fruitful avenue for increasing the retention of women in STEM.
Conclusion: Underlying mechanisms.-Insummary, it is important that researchers and practitioners diagnose and then address the underlying mechanisms that could be influencing the experience of women in STEM.There are many more possible mechanisms than have been laid out in this review and multiple may be occurring in the same classroom.We present these mechanisms because we believe interventions will be most effective if they address the root causes of gender inequalities rather than the superficial issues.By targeting interventions at these underlying mechanisms, we have the potential to impact all the downstream factors such as self-efficacy, belonging, identity, performance, engagement, and ultimately persistence.

V. RESEARCH RECOMMENDATIONS
This review of research findings on gender inequalities in undergraduate STEM is intended as a starting point for instructors and researchers interested in promoting gender equity in STEM.We offer this framework as a rudder to steer the national conversation away from counting the numbers of women in STEM and toward documenting the more subtle gender gaps that are at the root of those numeric inequalities.We also propose that this framework could help instructors and researchers explore the connections and synergies between the factors impacting the experience of women in undergraduate STEM classrooms.
This framework and the accompanying literature review represent only an initial effort to better define why there are not more women in STEM, and much work remains.We call on instructors and researchers to more systematically catalog and evaluate the experiences of women in college STEM classrooms and programs so that we can better understand the extent to which gender inequalities exist.With a more extensive literature base, we can begin to identify differences and communalities between STEM fields in terms of these inequalities.Currently, STEM fields appear fairly similar in terms of the experiences of women, but this may be due to lack of data.In addition, rather than collecting gender gap data in a piecemeal fashion, we argue that it would be most informative to collect multiple measures on the same pool of students and to do this longitudinally.This type of data collection would enable researchers to identify (a) the factors that change as a student progresses through a class or program of study as well as the direction of that change, and (b) the relative contributions of different factors to performance and/or persistence in STEM.
In addition, we recommend that future research on the sociocultural and psychological factors that may impact women in STEM takes place in actual STEM classrooms.Currently, few studies explore these factors in the specific context of undergraduate STEM and results may be very different in these settings where performance measures (e.g., grades) have real impacts and students are highly motivated to achieve [84].As STEM practitioners and discipline-based education researchers are not generally trained in psychology or sociology, it may be a more effective and efficient approach to collaborate with sociologists and social psychologists.Sociocultural factors are complicated and nuanced and expertise in the underlying theories is essential to interpreting how these factors are influencing the experience of women in STEM and for generating effective interventions.Conversely, social psychologists argue the mechanisms they have identified, like stereotype threat, are important factors in STEM classrooms, yet the majority of their studies are conducted using psychology students or in psychology labs-a much different context from large introductory science courses.Research partnerships that span the social sciences and natural sciences are likely to be fruitful for both parties, but more importantly are likely to result in a greater impact on the next generation of female STEM scholars.
Our final suggestion is a greater emphasis on intersectionality in studies of women in STEM.Gender is only one of a myriad of identities that a student brings into the classroom and all of these identities impact the experience of students.A cis-gendered black woman from an uppermiddle class background in a STEM classroom may have a very different experience than a trans-gendered Asian American student from a working class background.These intersections of identity have not been deeply explored in the discipline-based education literature.It is important for researchers to be aware that most of the studies presented here were done at primarily white institutions and, thus, the results are likely driven by the experiences of white women.Yet, the experiences of white women may not capture the experiences of women of other races and ethnicities.For example, in a study comparing math-gender stereotypes held by white and black women, white women held a much stronger belief that males are better at math than black women [127].This means a black woman's susceptibility to stereotype threat based on gender may be different in the classroom than a white woman's.A second example is the experience of double jeopardy: women of color confront stereotypes about STEM in two social identities: gender and race.Thus, women of color may be more likely to encounter situations where they are made to feel like they do not belong in STEM.In a recent study of 60 female scientists of color, 100% said they had encountered gender bias [128] while another study on black female scientists found 80% reported encountering racial bias [129].It is essential that research efforts begin to account for both gender and race or ethnicity, in addition to other social identities.In this review there were a few studies that acknowledged and explored the intersection of gender and race, but much more work needs to be done.

VI. CONCLUSION
Through this article we hope to emphasize that gender inequalities are occurring beyond retention and can be found in achievement, participation, and affective measures.Further, all of these measures could be impacted by underlying sociocultural factors that students and instructors bring into the classroom.To effectively ameliorate gender disparities in college STEM, we propose that we need to address the root causes of these disparities.To do so, we need to engage in dialogue with other disciplines spanning the natural and social sciences to probe common factors that influence gender disparities at the collegiate level.Working together, rather than as single disciplines, we hope to move one step closer to gender equality.

FIG. 1 .
FIG. 1. Model describing how gender gaps in affective and observational measures can impact persistence in STEM and the underlying sociocultural factors influencing all of them.Modified from Wang and Degol's [87] model for understanding career choice.

TABLE I .
A sample of papers reporting on academic performance and classroom engagement of males and females in two or more STEM classes or with STEM undergraduates.

TABLE II .
Survey of articles reporting on gender gaps in affective measures in STEM majors or STEM courses.