Introductory physics students’ recognition of strong peers: Gender and racial/ethnic bias differ by course level and context

,

Prior work has found conflicting results with regard to whether there is a gender bias in introductory STEM students' perceptions of their peers [30][31][32].These studies observed that students disproportionately nominate men over women as strong in their biology and physics courses [30,32], but that men and women receive comparable numbers of nominations in mechanical engineering courses [31].Whether these discrepant results are due to varying student populations (e.g., students' majors, academic years, and demographics) in the observed courses, the scientific discipline of the course, or some other factor is still unresolved.We advance this body of work by collecting and analyzing students' nominations of strong peers in three different remote physics courses serving various student populations.We aim to determine how the student population of and context within a course (lec-ture or laboratory) are related to students' perceptions of their peers.We also expand previous work by considering possible racial/ethnic biases, in addition to gender biases, in students' nominations.

A. Recognition in STEM
We situate our study in the theoretical framework of identity.An individual's identity refers to their being a "certain kind of person" in a given context [33].Science identity, therefore, is the degree to which an individual believes they are a "science person."Researchers have conceptualized a model of science and engineering identity containing four dimensions: performance, competence, interest, and recognition [18,34].Studies show that recognition is one of the most important of these dimensions in predicting students' participation, persistence, and career intentions in science and engineering [18,[34][35][36][37][38][39][40][41].Recognition is the degree to which meaningful others (e.g., peers, teachers, and family) perceive and acknowledge an individual as a science person.When a student receives ample recognition from others, they are likely to see themselves as a science person and develop a strong science identity [37,42].
Given the importance of recognition, it stands a reasonable desire for all students to feel recognized by their peers in a science classroom.Recognition, however, is "culturally produced": it is influenced by sociohistorical norms and stereotypes [18,39,43].For physics in particular, stereotypes often position men and non-URM students as more suitable to the field than women and URM students [12-22, 28, 29].In turn, empirical work shows that men and non-URM students report higher senses of recognition in their physics classes than women and URM students, respectively [36,39,44].Close examinations of women and women of color in physics [18,43,45] also affirm that high-achieving or "exceptional" women in the field hinge on recognition from others to succeed.Thus, students' gender and race/ethnicity are closely related to their recognition and identity in physics.Previous work, however, has not analyzed racial/ethnic bias in students' nominations of strong peers (they analyze gender bias only) [30][31][32].To contribute to this gap, we measure both gender and racial/ethnic bias in the current study.
Prior research also suggests that recognition may vary within different disciplines, instructional formats, and student populations.Grunspan and colleagues [30] examined three iterations of a large, introductory biology course (the second in the course sequence) with gender-balanced enrollment.They found in all semesters that men received significantly more nominations as strong in the course material than women.The extent to which this bias occurred, however, varied between instruction types: they observed that women nominated other women more frequently when the instructor employed 'random call' during class.Salehi and colleagues [31] later performed similar work for two offerings (one with traditional instruction and one with active learning instruction) of a medium-sized, gender-balanced mechanical engineering course taken by second and third-year students planning to major in engineering.They expected that the nature of the engineering discipline would lead to more gender bias in peer recognition than that found in biology.Instead, they found no gender bias in their observed nominations in either course offering.Lastly, Bloodhart and colleagues [32] analyzed peer perceptions across many introductory life science and physics courses.They do not report the student populations of each course, but, in aggregate, students in the life science courses were mostly non-URM women in their first year of study and the physical science courses were mostly non-URM men across all four academic years.The researchers found in the two disciplines that both men and women under-nominated women as knowledgeable in the course material in comparison to women's actual final grades in the class.To examine whether the different results across these three studies are attributable to varying disciplines and student populations, we analyze three physics courses serving different populations of students.
The studies above also suggest that instruction type, such as a traditionally taught lecture versus an active learning course with frequent group work, may influence recognition.In physics, furthermore, recognition likely varies between the instructional contexts of lecture and laboratory (lab) work.Not only do these two contexts involve very different pedagogies (lectures contain many students who focus on the instructor and labs contain a small number of students who collaborate on tasks), but they also cover distinct content and aim to develop different sets of skills [46][47][48][49][50][51].Correspondingly, research has shown that students relate knowledge of mathematics or theoretical physics with lectures, while they view "doing lab" as handling machinery and using technical skills [12,13].Such differences in relevant skill sets suggest that different students may be recognized as strong in lecture and lab contexts.Thus, we probe and analyze students' recognition of peers in the two contexts separately in the current study.

B. Current study
We collected students' nominations of peers they believed were strong in lecture and lab contexts in three different remote, introductory physics courses at Cornell University.These data allowed us to compare peer recognition across instructional contexts (within our study) as well as across disciplines (comparing our study of physics courses to prior research on courses in other disciplines).Each of the three physics courses also contained varying student populations in terms of students' major, academic year, gender, and race/ethnicity.This variation allowed us to examine whether gender and racial/ethnic biases in peer recognition depend on features of student population as suggested by prior studies [30][31][32].The following two research questions guided our study: 1. To what extent do gender and racial/ethnic biases exist in students' recognition of strong peers across three different introductory physics courses serving distinct student populations?
2. How do introductory physics students' recognition of strong peers differ in lecture and lab contexts?
Comparing the three courses in our study to previous studies [30][31][32], we find that whether students' perceptions of strong peers in lecture exhibit a gender bias might depend on course level over other variables (e.g., student populations or scientific discipline of the course).Courses serving first-year students exhibit a gender bias in lecture perceptions, while those serving the beyond-first-year level do not.Surprisingly, we also find in some cases that URM students tend to receive more nominations than their non-URM peers.With regard to instructional context, we observe that both the general patterns of nominations and whether gender or racial/ethnic bias exists in nominations differ between the lecture and lab contexts.

II. METHODS
In this section, we first characterize the courses and students we analyzed.Then, we describe our data collection and analysis methods.

A. Courses and participants
Our data come from three introductory physics courses at Cornell University, which we call Courses A, B, and C.These courses were held during the Fall 2020 semester, when about three-quarters of students at the institution resided on campus but most courses were held online due to the COVID-19 pandemic.Course A was an introductory, calculus-based mechanics course aimed at first-year students intending to major in engineering or other STEM disciplines.Course B was also an introductory, calculus-based mechanics course, but primarily served first-year students intending to major in physics.TABLE I: Summary of instruction modality, student demographics, and student final course grades for the three courses we analyzed.All online meetings were synchronous.Percentages are relative to the number of students included in the analysis.We denote students' gender or race/ethnicity as 'unknown' if they preferred not to disclose this information on the survey or if they did not complete the survey (but did consent to other parts of the research).Grades are provided on a 4.0 scale.As summarized in Table I, each course had lecture, lab, and discussion sections.Most course components were held online (synchronously), with a few held in person.Lectures for all three courses were instructed by a male faculty member in the physics department.Courses A and C were "flipped," such that students read relevant sections of the textbook and took a reading quiz before coming to class.Lectures for Course A used conceptual iClicker questions and instructor demonstra-tions, while lecture time in Course C was spent on problemsolving questions through Learning Catalytics [52].In both courses, students answered questions both individually and in groups.Course B was more traditional in that, during lectures, the instructor presented new content and asked iClicker questions that students answered individually.Courses B and C (but not Course A) used an online discussion forum where both students and teaching staff could post questions and answers related to course content at any time.
In all three courses, lab and discussion sections were led by graduate teaching assistants.Lab sections met once per week for two hours and discussion sections met twice per week for 50 minutes.Each section contained approximately 20 students who worked together in small groups of two to four.The labs and most of the discussion sections took place online through Zoom and students worked in groups in virtual breakout rooms.In the few discussion sections held in person, students worked together at round tables.Labs were inquiry-based [46][47][48][49][50][51], where students performed open-ended We collected students' self-reported gender, race/ethnicity, intended major, and academic year via a survey at the beginning of the semester.We grouped race/ethnicity by URM status, where non-URM students are those solely identifying as White and/or Asian/Asian American and URM students are those identifying as at least one of any other race/ethnicity (including Black or African American, Hispanic/Latinx, and Native Hawaiian or other Pacific Islander).These student populations and average final course grades are summarized in Table I.Similar proportions of men (42%) and women (47%) were enrolled in Course A and roughly three-quarters of the students in this course identified as non-URM (71%).Men and women received comparable final course grades on average in this course, as did URM and non-URM students.Most students in Course B were men (70%) and more than three-quarters of the students identified as non-URM (81%).Men and non-URM students on average received higher final course grades in this course than women and URM students, respectively.Course C contained similar proportions of men (45%) and women (51%) and two-thirds of the students identified as non-URM (66%).Men and women on average received comparable final course grades in this course, while non-URM students on average received higher final course grades than URM students.

B. Data collection
In all three courses, we administered an online survey during the eighth week of the 15-week semester as part of a lab assignment about students' group work experiences.On the survey, we asked students to nominate peers who they believed were knowledgeable in the course with the following two prompts adapted from prior work [30][31][32]: Please list any students in this physics class that you think are particularly strong in the lecture/discussion section material.
Please list any students in this physics class that you think are particularly strong in the lab material.
We refer to the first prompt as "lecture perceptions" and the second prompt as "lab perceptions." The survey was in an open response format (one text box) and students could respond with an unlimited number of names.This format avoids students feeling obligated to fill a quota and writing down extra names of peers they may not actually perceive as strong [53].Students were also not given a class roster from which to choose or look up names.This resulted in some listings being hard to match to the class roster during analysis, as there were instances of students misspelling peers' names and reporting just a first or a last name.Thus, text processing of the responses was necessary.We compared each name reported on the survey to the class roster and matched up names for which the number of corrections needed to match the full name on the roster was fewer than 0.3 times the length of someone's full name.We chose the constant 0.3 via trial and error, finding that this worked best for capturing as many close matches as possible without producing false negatives.If a name (either first or last) appeared multiple times in the data set, then we did not match on listings of just that name itself and only matched listings of the other half of the name or the full name.
As summarized in Table II, survey response rates were high (all above 75%) and students in each course on average listed one or two peers for each prompt.Our analysis included all students who responded to the survey and/or were listed by at least one peer.We also only included the nominations made by students who consented to participate in research (more than 95% of survey responders).If a consenting student nominated a non-consenting student, we included the nomination, but removed all information (demographics, etc.) about the non-consenting student.In all courses, at least 93% of the enrolled students are included in our analysis (see Table I).
We note that prior studies [30,31] used surveys late in the semester, which formed highly centralized networks (many nominations were concentrated to a few students) for courselevel perceptions.In our study, two of the three courses exhibited highly centralized lecture perception networks at the mid-semester mark, so there is no reason to believe our results are impacted by the timing of survey administration.
At the end of the semester, we collected discussion and lab section enrollment for all courses.For courses B and C, we also collected the number of student contributions to the course's online discussion forum (sum of their posted questions and posted answers to others' questions).Course A did not use a discussion forum.We used these discussion forum contributions as a measure of students' outspokenness be-cause it quantifies students' communicative engagement during an online course.This is a similar, but distinct, measure to that of Grunspan and colleagues [30], who determined outspokenness by asking the instructor to name actively participating students after each class meeting.Some students had no discussion forum data, indicating that they likely did not ever register for or use the forum.For these students, we imputed their contributions to the discussion forum as zero, which was also the mode of each course's distribution of contributions.We imputed one student's and 55 students' discussion forum data in Courses B and C, respectively.

C. Analysis of nominations
We converted the survey responses into directed networks for each course (A, B, and C) and each context (lecture and lab).Nodes represented students and edges (or ties) represented all nominations made between students.To first gain a sense of each network's overall structure, we calculated two network-level statistics: density and indegree centralization.Density measures the proportion of all possible edges in the network that we observed.Indegree centralization measures the extent to which the nominations are concentrated around a single student or a small subset of students (i.e., whether there are emergent celebrities who receive most of the nominations).Higher indegree centralization indicates higher concentration around one or a few students.We determined the standard errors of each of these statistics via bootstrapping: resampling the observed network many times, calculating the statistic of each sampled network, and then determining the standard deviation of the statistic among all of the sampled networks [54,55].The bootstrapping was performed with 10,000 bootstrap trials for each network using the snowboot package in R [56].
We then used exponential random graph models (ERGMs) to statistically determine the salient structural characteristics of our networks.ERGMs assume that the observed network is a realization from a random graph that comes from a distribution belonging to the exponential family [57,58].They allow us to perform many statistical tests at once, determining whether the frequency of certain patterns or configurations in our observed network is significantly different than if the ties were formed randomly.To formulate such a model, we first choose a principled set of predictor variables (i.e., configurations) that might explain the formation of the observed network.These variables may be structural (e.g., measuring the tendency for mutual nominations) or nodal (e.g., measuring the extent to which students of a certain gender are more likely to receive a nomination).The goal is to use these k network statistics g k (y) and their corresponding coefficients θ k to predict the formation of the random network Y .The model takes the form where y is a realization of the random network Y and ψ = y exp ( k θ k g k (y)) is a normalization constant that ensures that the probability sums to one.Given an observed network y, the coefficients of the model are estimated using Maximum Likelihood Estimation (MLE).Due to the dependence between the network ties, the MLE is commonly approximated with Markov Chain Monte Carlo (MCMC) techniques [59], which we used to fit all models in our analysis.
There are two different ways to interpret the coefficients of ERGMs.In general, the coefficients weight the importance of each modeled configuration for the formation of the realized network, where positive (negative) coefficients show that the configuration is observed more (less) frequently than by chance after accounting for all other configurations that are modeled.The second way to interpret the coefficients is to focus on specific ties of the network.In this "change statistics" interpretation, the coefficient θ k of the kth configuration shows how much the log-odds of a tie being present changes if the formation of the tie increases the kth configuration by one unit, holding the rest of the network constant.For instance, if the predictor variable measures the number of mutual ties in the network, its coefficient represents how much the log-odds of a tie being present increases when the addition of this tie would reciprocate an existing tie.
We initially fit ERGMs with the same set of predictor variables used by Grunspan and colleagues [30] for each of our observed networks.Our model contained one additional variable for discussion section homophily (the tendency for students to nominate peers enrolled in their same discussion section), because discussion was an extra structural component in our courses.We also added three variables to measure effects of race/ethnicity, which exactly mirrored the structure of the gender variables in the original model of Grunspan and colleagues [30].Inspection of the goodness-of-fit diagnostics, however, revealed a significant inadequacy in this model: we were not appropriately capturing the presence of triadic closure.Triadic closure is the tendency for three nodes to be connected, given pairwise connections.That is, if ties exist between nodes A and B and between nodes B and C, then a tie between nodes A and C forms triadic closure.In some cases, we were also not adequately capturing the network's outdegree distribution (the proportion of nodes making a certain number of nominations).Our model did sufficiently account for each network's indegree distribution (the proportion of nodes receiving a certain number of nominations).
In response, we altered the original model from Grunspan and colleagues [30].We added a geometrically-weighted edgewise shared partner (GWESP) variable, which is typically used to account for triadic closure.The more ties two nodes have in common (i.e., the more shared partners they have), however, the higher the probability of an edge forming between them.Thus, a decay parameter for the GWESP variable determines the extent to which the probability of tie formation decreases for each additional partner already shared between two nodes [60].This parameter can take on a value between 0 and 1, with lower values creating larger decreases in tie probability per subsequent shared partner.We used a fixed decay parameter of 0.25 as is commonly used in ERGM literature [61][62][63].Because incorporating both a GWESP term and an isolates term (for students receiving zero nominations; used in the original model) produced degeneracy in the model, we removed the isolates term.We also changed the structure of the gender and race/ethnicity variables to allow for easier and more meaningful interpretations.Specifically, we added a variable to the model for each possible directed tie for the gender and race/ethnicity attributes (e.g., man nominating a man, man nominating a woman, etc.) as in Ref. [64].These network statistics allowed for a more direct comparison of ties by using common base terms for gender and race/ethnicity variables and thus an easier interpretation of gender and racial/ethnic biases.We note that in creating these variables, we fit models with each possible base term for gender and race/ethnicity to the observed networks.We ultimately chose to use nominations between majority demographic groups as the base terms, however the results are consistent with those of all the possible models.
These modifications resulted in an improved model fit for every observed network.For all six observed networks, the goodness-of-fit diagnostics showed that we were capturing the distributions of indegree, outdegree, and triadic closure well with the revised model.Coefficient estimates using the original model of Ref. [30] and an example of goodness-of-fit metrics for both models are provided in the Appendix.We report in the main text our results using the revised model, which contained the following predictor variables: We determined the coefficient estimates of these variables for each of our six observed networks using MCMC MLE and then compared the results across courses and contexts.

III. RESULTS
We first compare the structures and network-level statistics of each observed network.Then, we present the results of the exponential random graph models.
Figures 1 and 2 show the network diagrams of lecture and lab perception networks, respectively, for each course.In each diagram, edges point from the nominator to the nominee and larger nodes represent students who received more nominations (i.e., higher indegree).Nodes are colored by gender and nodes with bold outlines indicate celebrities.These same network diagrams with nodes colored by race/ethnicity can be found in the Supplementary Material.
Table III summarizes the density and indegree centralization measures for each observed network.Within each course, the network densities of the lecture and lab perception networks are similar to one another.We see this as similar levels of connectedness (proportion of possible edges present) in Figs. 1 and 2 for lecture and lab perception networks within each course.This similarity in network density, however, is despite the very different structures of these connections across contexts.That is, students nominate similar numbers of peers in each network, but the distribution of who receives the nominations is different between lecture and lab contexts.
In Course A, the indegree centralization (the extent to which the network is concentrated around just a few students) is similarly low for both lecture and lab perceptions.Correspondingly, we see in Figs. 1 and 2 that there are no emergent celebrities in either network for this course (no nodes are drastically larger than the others).In Courses B and C, however, the indegree centralization value for the lecture perception network is larger than that for the lab perception network by an order of magnitude.This suggests that the lecture perception networks of these two courses are much more concentrated around a few prominent students (celebrities) than the lab perception networks.We observe in Fig. 1 that the lecture perception networks of Courses B and C contain three and two celebrities (nodes with bold outlines that are much larger than the rest, having received many more nominations), respectively.On the other hand, we see in Fig. 2 that there are no central nodes receiving many nominations in either of the two lab perception networks (all nodes are similar in size).Thus, for Courses B and C, despite the similar density measures, the lecture perception networks are much more concentrated around a few celebrities, with no outstanding celebrities in the lab perception networks.

B. Evaluating gender and racial/ethnic bias in peer recognition
Table IV shows the coefficient estimates for our revised exponential random graph model fit to all observed networks.We interpret the coefficient estimates as the log-odds of tie formation.For example, the coefficient estimate for the homophily on discussion section variable for Course A's lecture perception network is 1.59.This means that the log-odds of a tie forming in the network increases by 1.59 for each addi-tional tie connecting students in the same discussion section, holding the rest of the network the same.In other words, ties connecting students in the same discussion section are more probable than ties connecting students in different discussion sections, even after accounting for the other configurations included in the model.
For five out of six analyzed networks, the coefficient estimates for the woman → woman and woman → man variables, shown in light and medium green dots in Fig. 3 tistically significant.This means the frequency with which women nominate either a woman or man is not significantly different than the frequency with which men nominate other men (the base term) after adjusting for the other variables in the model.In other words, women proportionately nominate their female and male peers in these five networks.In the lecture perception network of Course C, however, women nominate other women significantly more than men nominate other men.
The coefficient estimates for the man → woman variable, shown in dark green dots in Fig. 3, indicate that men significantly under-nominate women in the lecture perception network of Course A and both networks of Course B. The lecture and lab perception networks of Course B, moreover, have the largest and second-largest coefficient magnitudes for this variable, respectively.This suggests that the strongest gender bias occurs in Course B's lecture perception network.Making direct comparisons of ERGM coefficients across different networks, however, has limitations [65], so we consider this claim preliminary.We note that the coefficient estimate for the man → woman variable is not statistically significant in the lecture percep-  IV) and asterisks indicate statistical significance.
tion network of course C, however this might be due to one of the two celebrities having unknown gender.If we impute this celebrity's gender as a man, the man → woman variable becomes negative and statistically significant (implying a gender bias against women), though the other gender variables are not statistically significant.If we impute this celebrity's gender as a woman, the results related to gender are the same as when this celebrity's gender is unknown.The dependency of the statistical results on this one celebrity's gender offers an important caveat to our interpretations discussed in the next section.
The gender patterns suggested by our model fits are illustrated in Figs. 1 and 2. Despite there being no clear celebrities in the lecture perception network of Course A (the topnominated man and woman received five and four nominations, respectively), our statistical model indicates a gender bias in this network.Therefore, men, on average, receive more nominations than women (the average size of the blue nodes is greater than the average size of the yellow nodes).For the lab perception network of Course A, men and women have an even distribution of nominations (the average size of the blue nodes is similar to the average size of the yellow nodes) as indicated by our statistical analysis.On the other hand, all three emergent celebrities in Course B's lecture perception network are men (the three largest nodes, outlined in bold, are blue).In this network, the top-nominated man received ten times as many nominations (30) as the top-nominated woman (three).Course B's lab perception network is less centralized around a few celebrities, but we see that, on average, men received more nominations than women (the average size of the blue nodes is greater than the average size of the yellow nodes).For Course C, the lecture perception network has one male celebrity and a second celebrity with unknown gender (these two nodes are outlined in bold and overlap in the diagram).As mentioned above, the latter might explain why we did not resolve any gender bias in our model fit.Finally, Course C's lab perception network is similar to that of Course A in that there is a relatively even distribution of nominations across men and women, in line with our quantitative findings above.
We examine the coefficient estimates for the race/ethnicity variables in a similar manner.These results are summarized in Fig. 4. In both networks of Course A and the lab perception network of Course C, none of the coefficient estimates for the race/ethnicity variables are statistically significant after adjusting for the remaining variables in the model.This means that no particular nomination type (e.g., URM student nominating URM student) occurs more frequently than anotherboth URM and non-URM students proportionately nominate their URM and non-URM peers (no racial/ethnic bias).In both networks of Course B, however, URM students disproportionately over-nominate URM peers, even after controlling for the other network configurations in the model (yellow dots in Fig. 4).Similarly, in Course C's lecture perception network, URM students significantly under-nominate non-URM peers (orange dots in Fig. 4) and non-URM students disproportionately over-nominate URM peers (brown dots in Fig. 4).Accordingly, one of the two prominent celebrities in this particular network (shown in Fig. 1) is a URM student.
Similar to the results related to gender, we note that the results vary for the lecture perception network of Course C if we impute the race/ethnicity of the second celebrity whose race/ethnicity is unknown.If we impute this student as non-URM, we find that URM students disproportionately overnominate URM peers in this network, with no change to the other two race/ethnicity variables.If we impute this student as URM, we find that both URM and non-URM students dispro-  IV) and asterisks indicate statistical significance.
portionately over-nominate URM peers.In both cases, therefore, we still find a tendency for URM students to receive more nominations than their non-URM peers.
C. Roles of final course grade, outspokenness, and section enrollment in shaping peer recognition The remaining predictor variables in the model lend insight into the association between final course grade, outspokenness on the online discussion forum, and section enrollment and the structure of our observed perception networks.All coefficient estimates for the final course grade of nominee variable, summarized in Table IV, are positive and statistically significant.That is, in all three courses and in both contexts, students with higher final course grades receive significantly more nominations than students with lower final course grades.The magnitudes of the coefficients also suggest that final course grade is a stronger predictor of receiving nominations in the lecture context than the lab context in every course, though again such comparisons should be considered tentative [65].We provide plots of the indegree distributions by final course grade in the Supplementary Material.
With regard to outspokenness (the discussion forum contributions of nominee variable, which was only measurable for Courses B and C), we find that students who contribute more to the discussion forum are significantly more likely to receive nominations as strong in the lecture context.Contributions to the discussion forum, however, are not a significant predictor of receiving nominations as strong in the lab context.We provide plots of the indegree distributions by number of discussion forum contributions in the Supplementary Material.
We observe similar patterns across courses regarding the re-lationship between lab and discussion section enrollment and recognition among peers.Coefficient estimates for the homophily on lab section variable are positive and statistically significant in every observed network, meaning that students are more likely to nominate peers in their lab section than peers outside of their lab section as strong in both lecture and lab content.Viewing the magnitude of the coefficients [65], this effect is, unsurprisingly, more pronounced in lab perception networks than lecture perception networks.On the other hand, coefficient estimates for the homophily on discussion section variable are positive and statistically significant in all three lecture perception networks, but they are not statistically significant in any of the three lab perception networks.This suggests that students tend to nominate peers in their discussion section as strong in the lecture material, but they do not systematically nominate discussion peers as strong in the lab material.

IV. DISCUSSION
In this study, we collected students' nominations of strong peers in three different remote, introductory physics courses with varying student populations.We advance previous work by measuring both gender and racial/ethnic biases and differentiating perceptions related to lecture and lab contexts.The remainder of this section synthesizes our findings for each research question and concludes by noting recommendations for instruction and limitations to the study.
A. Mixed evidence of gender and racial/ethnic biases in recognition Our analyses found mixed results regarding the presence or absence of gender bias in students' recognition of their peers.After adjustments for various measures reflecting structural tendencies of tie formation, women proportionately nominated their male and female peers in all courses and contexts (lecture and lab) except Course C's lecture perception network.In this network, women disproportionately nominated other women over men as strong in the lecture material.In contrast, men proportionately nominated their male and female peers in three out of six observed networks (lab perception network of Course A and both networks of Course C) after controlling for other network configurations.Men significantly under-nominated their female peers in Course A's lecture perception network and in both perception networks of Course B. Recall that if we impute the second celebrity in Course C's lecture perception network as a man, men also significantly under-nominated their female peers in this network.
The results related to gender bias in lecture peer perceptions add insight to those found in prior work [30][31][32].Across these studies, the courses vary by student population (majors, course level, and gender), instructional type (traditional and interactive lectures or non-traditional labs), and institution.This variability, understandably, leads to different conclusions in each study (including across the courses examined in our study).Contrary to expectations, a course's gender composition (whether gender-balanced, majority men, or majority women) and discipline (whether physical sciences and engineering or biology) do not seem to predict the presence or absence of a gender bias in students' recognition of their peers.Neither does the instructional style -whether traditional lecture, interactive lecture, or lab instruction -or class size (whether the course contains 90 or 400 enrolled students).
One common factor that is consistently associated with gender bias across different studies, however, is the course level.Across the four studies, courses at the first-year level (those in Ref. [30], Courses A and B in our study, and those in Ref. [32], assuming the student populations in the lower level courses are primarily first-year based on typical course enrollments) all exhibit gender biases, whereas those at the beyondfirst-year level (those in Ref. [31] and Course C in our study) do not.We posit that developing familiarity and friendship with peers in previous semesters allows for a more diverse set of students to gain recognition in subsequent courses.Students in beyond-first-year courses, for example, likely have had more opportunities to showcase their knowledge or skills in front of their peers during prior courses they take together.In students' first introductory courses, in contrast, a gender bias in peer recognition aligned with sociohistorical stereotypes [12][13][14][15][16][17] endures before the students get to know each other.Alternatively, this pattern could be due to selection effects where only those women who received substantial recognition in their first course enrolled in subsequent courses.Thus, all of the gender bias may have occurred in the first year courses, creating unequal representation of students in the subsequent courses where we no longer find a bias.We note that this relationship between course level and gender bias in peer recognition is a tentative claim given the celebrity of unknown gender in Course C's lecture perception network.
The modified analyses used in our study also add to the understanding of the nature of the gender bias in introductory STEM courses.As in the perceptions study in biology [30], we found that, when a gender bias in peer recognition existed, men under-nominated women, but women proportionately nominated both men and women.This result differs from that of Bloodhart and colleagues [32], however, who found that both men and women tend to under-nominate women.More details about their analyzed courses are necessary to determine which, if any, course features may have led to these different results.
Our study also uniquely evaluated whether a racial/ethnic bias exists in students' recognition of peers.Race/ethnicity was not a significant predictor of nominations in either network of Course A or the lab perception network of Course C. In both networks of Course B and the lecture perception network of Course C (even when imputing the second celebrity of the latter network as a URM or non-URM student), however, URM students were more likely to receive nominations than their non-URM peers.This suggests that, when the nomination probabilities are adjusted for other variables in the model, URM students received more recognition than their non-URM counterparts, despite the documented stereotypes against URM students in science [18][19][20][21][22]66] and indications that URM students report significantly lower senses of recognition than non-URM peers in their physics courses [36,39,44].
We discuss several possible explanations that may have influenced these surprising findings.First, for the networks where we found no racial/ethnic bias (both networks of Course A and the lab perception network of Course C), one might expect low statistical power would explain the lack of a measurable effect: URM students made up less than 30% of each analyzed course.This explanation is unlikely, however, given that we were able to statistically discern an effect in the other networks with comparable proportions of URM and non-URM students.Alternatively, we note that this study was fielded in the aftermath of the Black Lives Matter protests following the murder of George Floyd.Students (especially at Cornell University) were aware of the political climate [67,68], which might have created more awareness of racial/ethnic bias (compared to gender bias) and thus social desirability biases in the responses.We note the plausibility of this explanation given that the effect on URM students' nominations is either unbiased or in the opposite direction to what research would have predicted.Another explanation, particularly for the tendency for URM students to nominate URM peers in Course B, is friendship tie homophily.A host of research suggests that friendship serves as a mechanism for recognition [69,70] and that students tend to form friendships with peers of their same race/ethnicity [71,72].URM students in Course B, therefore, might have formed friendships with one another and in turn recognized one another as strong in the course.Finally, in this study we measured actual recognition, whereas some prior work measures perceived recog-nition [36,39,44].Students' actual recognition may differ from their perceptions of recognition, resulting in these different outcomes.For example, students from underrepresented groups may perceive lower recognition than they are actually given based on their awareness of existing stereotypes.We recommend for future research to directly compare perceived and actual recognition across demographic groups to examine this viable phenomenon.

B. Recognition differs between lecture and lab contexts
Different from previous studies, we probed peer perceptions related to lecture and lab contexts separately.We observed very different network structures across these contexts, with celebrities emerging in two out of three lecture perception networks but in none of the lab perception networks.We suspect that the structure of coursework in each context impacted the distribution of nominations.In the courses we analyzed, lectures contained half or all of the enrolled students (depending on whether there were one or two lecture sections).The few highly motivated students (i.e., the celebrities) who frequently participated in lecture by answering or asking questions in front of the rest of the class likely gained considerable recognition from peers as strong in the lecture context.Because lectures were held on Zoom, students could also readily see the names of these celebrities and recall them on the survey.A similar phenomenon may have also occurred during online office hours, which were (anecdotally) very busy.By contrast, labs were held on Zoom and used breakout rooms, so students were restricted to interacting with and seeing the names of just a few peers.Lab groups were also held stable throughout the semester, allowing for meaningful yet limited recognition [37,73].We note that while discussion sections also used online breakout rooms, they did not necessitate interaction between students (students submitted individual work, if at all, and some groups would leave their cameras and microphones off and work independently).Labs, on the other hand, required students to set up their experiments, collect and analyze data, and coordinate lab notes for a weekly group submission, all of which were negotiated through conversations.
Our findings pertaining to gender, moreover, suggest that students perceive male and female peers' expertise with lecture and lab material differently.The presence or absence of a gender bias in peer recognition varied between contexts in Course A, with a gender bias in lecture but not in lab perceptions.In Course B, there was a stronger gender bias in lecture perceptions than lab perceptions.The lecture, but not lab, perception results mostly agree with prior work, which has found a gender bias in course-level perceptions [30,32].We speculate that, as was the case in the study of Grunspan and colleagues [30], men may have been more verbally outspoken than women in the lecture sections (though we did not measure this).Indeed, it has been shown in introductory science courses that women are less comfortable participating in whole-class discussions than men [74] and that women respond less frequently than men to instructor-posed questions to the class [28,75].In remote courses in particular, research has found that men participate more than women both verbally and in the chat window and that students acknowledge chat messages from male peers more than female peers [76].Recognition in labs, on the other hand, occurs among students working together in their lab groups.In our observed courses, lab groups were created based on a group-forming survey where students could indicate their preferences related to group work, for instance if they wished to work with (or not work with) certain peers.Instructors intentionally created lab groups based on the survey and also avoided groups with isolated women.One possible explanation for observing less gender bias in labs, therefore, is that women had sufficient opportunities for recognition within their majoritywomen or all-women lab groups.Alternatively, students may hold different conceptions of what it means to be "strong in the lab/discussion section material" and "strong in the lab material."This seems plausible given that we found more gender bias in the lecture context than the lab context, yet stereotypes typically associate physics and masculinity with technical skills (lab) and natural brilliance (lecture) [12][13][14][15][16][17].To explore this possibility, we have modified the perceptions survey to also ask students to briefly explain their nominations.We will use these data to unpack the traits or behaviors students attribute to being strong in each context and whether this explains the difference in gender bias.
We also found that higher final course grades predict more received nominations from peers across all courses and contexts, in agreement with prior work [30,31].Results related to discussion forum contributions and section enrollment, however, varied by context.We found that students' contributions to the course discussion forum was a significant predictor of receiving nominations in lecture, but not in lab, perception networks.We suspect that this difference occurred because most content posted on the discussion forum was related to lecture material rather than lab material.It appears that participation in the discussion forum served as a means of becoming more visible to and recognized by peers, but only in regard to the content being discussed.The lecture perception networks, moreover, contained emergent celebrities in the two courses using a discussion forum, Courses B and C, but not in Course A. Frequent visibility in the discussion forum, where students' names are explicitly tied to their contributions, might be a mechanism for students receiving many nominations as strong in their course.Similar to previous studies [30,31], we also observed that lab section enrollment is important for shaping peer perceptions in both contexts, while discussion section enrollment is only a strong predictor of lecture perceptions.In other words, students learn about one another's strengths and acquire recognition related to lecture material in both discussion and lab sections, while they learn about each other's strengths related to lab material only in lab section.

C. Recommendations for instruction
Together, our results related to gender and racial/ethnic bias in peer recognition point to courses (first-year level) and con-texts (both lecture and lab) in which students from underrepresented backgrounds (mostly women) may receive less recognition than their peers and, therefore, may be at a disadvantage for developing their physics identity [37,42].Because recognition is one of the most important dimensions of physics identity [18,[34][35][36][37][38][39][40][41], instructors may support all students' identity development by facilitating more equitable peer recognition.
Our findings suggest that instructors, particularly of STEM courses at the first-year level, should aim to create opportunities for meaningful recognition in all aspects of a course.For example, research suggests that friendship and collaboration with peers is one mechanism for recognition: interacting with others allows for students to showcase their knowledge and skills and acquire validation from others [69,77].Opportunities for recognition, therefore, may be achieved through more student-centered instructional styles, where students work closely with one another in small groups [37].Though small group work is already common in labs, lectures often place emphasis on individuals answering questions in front of the class.Poll questions and other active learning activities implemented in lectures may be turned into group discussions and group submissions rather than individual work.Further, if students are presenting group work in front of the whole class, instructors can create opportunities for positive recognition by allowing groups to discuss the ideas before asking them to share, increasing the likelihood of groups landing on the correct answer [78].
In terms of forming groups, prior research suggests that historically underrepresented students (women and URM students) benefit from working in groups where they are not isolated [79,80].One study, for example, found that gender homogeneous and majority-women groups performed better when solving physics problems than majority-men groups [79].The researchers observed in these majority-men groups that the men dominated the conversation, ignoring suggestions from their female peer.Our results agree with this work from a different perspective, namely recognition.We found that the gender bias in peer recognition was weaker in the lab context (where groups intentionally avoided isolated women) than the lecture context (where any group work was completed in unintentionally formed groups).Our results, therefore, support the prior work recommending that underrepresented students (based on both gender and race/ethnicity) be placed in groups where they are not the minority.Previous research also suggests that students tend to become friends with, and therefore may be more prone to recognize, peers of similar academic achievement [70].Intentionally forming heterogeneous groups of students based on performance, therefore, might enable students of different levels of academic achievement to recognize each other.
As to whether groups should be held the same or changed throughout the semester, our study cannot make a strong recommendation one way or another.Close and consistent collaboration with group members seems to allow students to overcome any implicit biases with which they enter the course [73].Meanwhile, altering groups may be beneficial in allowing students to gain recognition from many of their peers.
Outside of group work, instructor-posed questions to the whole class can still provide many students with opportunities to gain recognition.For a given question, an instructor may ask for multiple volunteers to share their thinking before hearing from any individual student ("many hands") or to randomly select individuals to share ("warm" or cold call) [81].The instructor can also explicitly ask for different volunteers each time.Instructors should also be cautious when allotting praise to students' answers to the whole class.If a student's answer is followed by "Perfect!"there is little room for other students to contribute or ask questions, limiting exposure to peers to just the one student who volunteered.

D. Limitations
We end this section by acknowledging multiple limitations of our study.First and foremost, we collected our data during a global pandemic.Students' learning experiences were certainly impacted by this event [82][83][84][85][86] and, as a result, our findings may not be generalizable to physics instruction during normal circumstances.In addition, the courses analyzed here were almost entirely held online.While our results align with some of the previous work from in-person courses, future work should perform a more systematic comparison of peer perceptions in face-to-face and remote courses.
With regard to our methods, our perceptions survey may not have captured all nominations.We did not provide students with a list of names to look at when filling out the perceptions survey, so students may not have remembered the names of individuals they perceived as strong in the material.Additionally, we only collected survey responses at the midpoint of the semester.Other work administered surveys either both at the midpoint and endpoint of the course [30] or just at the endpoint of the course [31].Future work comparing physics students' perceptions at multiple points in the course or just at the end of the course may add nuance to our findings.
We also performed text processing on the surveys to match the reported names to the class roster.This process dropped fewer than 5% of the reported nominations (for instance, due to students misspelling a peer's name).We note the possibility for bias in the text processing because certain kinds of names might be less likely to be matched and thus more likely to be dropped from analysis.On one hand, students with common first or last names in the class might be less likely to be matched, particularly if only their first or last name is listed.Complicated names might be more prone to misspelling and, therefore, may also have a low probability of being matched.Rare first or last names, on the other hand, are more likely to be matched because they are unique.Whether certain kinds of students, for instance non-URM and URM students, have common or rare names might influence the representation of demographic groups in the data.Because we were able to match a high percentage of the data (more than 95%), however, we do not believe this potential bias impacted our study's results.
In addition, we categorized race/ethnicity in terms of URM status because the number of students in each racial/ethnic group was too small for our quantitative analysis to produce useful and interpretable results.However, this inevitably masks differences in recognition between students of individual racial/ethnic groups.Future research should seek to study more diverse student populations with statistically sufficient representation from all racial/ethnic groups.We also treated gender and race/ethnicity separately.It seems valuable for future work to determine whether gender and race/ethnicity are separately important for peer perceptions or whether it is the intersection between gender and race/ethnicity that significantly explains recognition patterns.Finally, peer recognition might depend on other variables that we did not measure or analyze in this study.For example, research suggests that students view their friends as strong in the course [69,70].To determine whether this is the case, students' friendship ties could be collected and added as a predictor variable in the statistical model.Students' majors might also be related to recognition: perhaps students view peers in particular majors (e.g., STEM majors or physics majors in particular) as stronger in physics than peers in other majors (e.g., life sciences or non-STEM majors).We did not examine the relationship between student majors and recognition in this study because the courses were quite homogeneous on major (either most students were studying engineering or most students were studying physics), many students did not report their sub-field within the engineering school, and some students did not yet declare a major.Future work should investigate whether and how friendship ties, students' majors, and other variables relate to students' perceptions of strong peers.

V. CONCLUSION
Examining students' nominations of strong peers, we found variation in gender and racial/ethnic biases across three dif-ferent remote, introductory physics courses.We observed that courses primarily serving first-year students exhibited a gender bias in lecture perceptions, while those serving beyondfirst-year students did not.Additionally, URM students were either more or equally likely to receive nominations than their non-URM peers, contrary to what prior research would predict.Recognition also varied between lecture and lab contexts.Lecture perception networks contained a few central students receiving many nominations, however no outstanding celebrities emerged in the lab perception networks.These results suggest that recognition varies within different student populations and instructional contexts.Findings also point to advantages of instruction that emphasizes small group work and allows for many different students to speak up in front of the class.These instructional efforts are hopefully the first steps toward creating more widespread, rather than skewed, distributions of peer recognition, such that all students can develop their physics identities -a critical predictor of participation, persistence, and career intentions in physics and other science disciplines.
Table V shows the coefficient estimates using the original model from Ref. [30] with our data.We include four additional variables to measure discussion section homophily and racial/ethnic bias.Figure 5 compares the goodness-of-fit metrics of the original model and our revised model (presented in the main text).The horizontal axis represents a network measure -outdegree (number of nominations made), indegree (number of nominations received), and edge-wise shared partners (measure of triadic closure) -and the vertical axis represents the proportion of students who display that network measure.Plots show the distribution of the network measures in the observed data (thick black line) and the distribution for 10 network simulations from the model (boxplots).While the original model captures the outdegree and indegree distributions sufficiently, the revised model significantly improves how well the model captures the distribution of edgewise shared partners.TABLE V: ERGM coefficient estimates using the original model of Grunspan and colleagues [30].Standard errors of the coefficient estimates are in parentheses.Asterisks indicate statistical significance ( * p<0.05; * * p<0.01).For every network, the goodness-of-fit metrics for the original model are better than those for the revised model (shown in Fig. 5). Course

FIG. 1 :
FIG. 1: Lecture perception networks.Nodes are colored by self-reported gender and sized proportional to indegree (number of received nominations).Nodes with bold outlines indicate celebrities (three in Course B and two in Course C).Edges point from the nominator to the nominee.

FIG. 2 :
FIG.2: Lab perception networks.Nodes are colored by self-reported gender and sized proportional to indegree (number of received nominations).Edges point from the nominator to the nominee.

FIG. 3 :
FIG. 3: Plot of ERGM coefficient estimates for the gender nomination variables.The base term (i.e., coefficient estimate of zero) is nominations from man to man.Error bars are the standard errors of the coefficient estimates (values shown in TableIV)and asterisks indicate statistical significance.

FIG. 4 :
FIG. 4: Plot of ERGM coefficient estimates for the URM nomination variables.The base term (i.e., coefficient estimate of zero) is nominations from non-URM to non-URM.Error bars are the standard errors of the coefficient estimates (values shown in TableIV) and asterisks indicate statistical significance.

TABLE II :
Survey response rates and mean number of nominations made per student.The survey response rate is the percent of enrolled students who completed the survey.The mean number of nominations is the average number of peers' names that a student listed.

TABLE III :
Network-level statistics for all observed networks.Density is the proportion of observed to possible edges.Indegree centralization is the extent to which the nominations are concentrated on one or a few students.Standard errors of the last digit are shown in parentheses.

TABLE IV :
Exponential random graph model results.Coefficient estimates of predictor variables for each observed network.We fit models with all possible permutations of the gender and race/ethnicity variables serving as the base terms, but here we show results using nominations between majority demographic groups as the base terms (man → man for gender and non-URM → non-URM for race/ethnicity).Standard errors of the coefficient estimates are in parentheses.Asterisks indicate statistical significance ( * p<0.05; * * p<0.01).