Who and what gets recognized in peer recognition

Previous work has identified that recognition from others is an important predictor of students' participation, persistence, and career intentions in physics. However, research has also found a gender bias in peer recognition in which student nominations of strong peers in their physics course disproportionately favor men over women. In this study, we draw on methods from social network analysis and find a consistent gender bias in which men disproportionately under-nominate women as strong in their physics course in two offerings of both a lecture course (for science and engineering, but not physics, majors) and a distinct lab course (for science, engineering, and physics majors). We also find in one offering of the lecture course that women disproportionately under-nominate men, contrary to what previous research would predict. We expand on prior work by also probing two data sources related to who and what gets recognized in peer recognition: students' interactions with their peers (who gets recognized) and students' written explanations of their nominations of strong peers (what gets recognized). Results suggest that the nature of the observed gender bias in peer recognition varies between the instructional contexts of lecture and lab. In the lecture course, the gender bias is related to who gets recognized: both men and women disproportionately over-nominate their interaction ties to students of their same gender as strong in the course. In the lab course, the gender bias is also related to what gets recognized: men nominate men more than women because of skills related to interactions, such as being helpful. These findings illuminate the different ways in which students form perceptions of their peers and add nuance to our understanding of the nature of gender bias in peer recognition.


I. INTRODUCTION
Gaining recognition as a physicist is important for students' participation and persistence in their physics course [1][2][3][4][5][6][7][8].Recognition is particularly important for the participation of historically underrepresented groups in physics, such as women [3,7,9,10].However, research has found that men both perceive higher recognition from others [2,7,11,12] and receive more recognition from their physics peers [13,14] than women.To better understand these effects, we investigated the nature of student recognition of strong peers with a focus on the gender bias in such peer recognition.Specifically, we probe two questions: whether and how gender bias in students' nominations is related to patterns of peer interactions (who gets recognized), and whether and how gender bias in students' nominations is related to their written explanations of these nominations (what gets recognized).Throughout the paper, we use the phrase gender bias to refer to a distinguishable difference between the amount of peer recognition received by men (women) and the amount of peer recognition we would expect men (women) to receive if recognition were distributed equitably across men and women.

A. Recognition in physics courses
A student's sense of physics identity -the degree to which they believe they are a "physics person" [15] -has been shown to predict their participation, persistence, and career intentions in physics [2,4,7].Researchers have modeled physics identity as containing three dimensions: performance and competence, interest, and recognition [3,4].Previous studies demonstrate that recognition is the most important of these four dimensions in relating to and predicting student outcomes [1][2][3][4][5][6][7][8].Recognition is the extent to which meaningful others (e.g., peers, teachers, and family) perceive an individual as a physics person.When a student receives more recognition from others, they are more likely to see themselves as a physics person and therefore develop a stronger physics identity [5,16].
Recognition from others, however, is often shaped by sociohistorical norms and stereotypes, such as those that position men as more suitable to the field of physics than women [3,7,9,[17][18][19][20][21][22][23][24][25][26].Perhaps as a result of such stereotypes, a handful of research studies demonstrate that men report higher perceived recognition (the extent to which they feel recognized by others) in their physics classes than women [2,7,11,12].This difference may put men in a better position than women to develop their physics identity, contributing to the underrepresentation of women in physics documented in, for example, Refs.[27][28][29].

B. Gender bias in peer recognition
Other studies have probed "actual" peer recognition, rather than perceived recognition, by asking students to nominate peers they believe are strong in their science course [13,14,30,31].We use the term "actual" recognition to mean a measure of how much others recognize an individual as a physics person, though the others may not have indicated that recognition to the individual, necessarily.These studies largely draw on quantitative methods of social network analysis to deter-mine the extent to which a gender bias exists in students' nominations of strong peers [14,30,31], finding mixed results.Grunspan and colleagues [30], for example, examined three offerings of an introductory biology course (the second in the course sequence) for first-year students.They observed that men disproportionately under-nominated women, while women proportionately nominated men and women, as strong in the course material in all three offerings.Bloodhart and colleagues [13] observed a similar gender bias in peer recognition across many introductory physics courses for firstyear students, but found that women also disproportionately under-nominated women as strong in the course material.In the same study, researchers found that men proportionately nominated both men and women, but women disproportionately over-nominated other women, as strong in introductory life sciences courses for first-year students.Salehi and colleagues [31], however, found no gender bias in either men's or women's nominations of strong peers across two offerings of a mechanical engineering course taken by second and thirdyear students.
Our previous work [14] examined peer recognition in three different remote physics courses and added nuance to these studies.We observed a gender bias in peer recognition favoring men (in which men disproportionately under-nominated women, but women proportionately nomination men and women) in two introductory physics courses aimed at firstyear students, but no gender bias favoring men in an introductory physics course comprised mostly of second-year students (though women disproportionately over-nominated other women in this course).Comparing across all four studies [13,14,30,31], the presence or absence of a gender bias in peer recognition seems to vary by course level more than any other aspect of the instructional context: researchers find a gender bias in peer recognition in science courses for firstyear, but not beyond-first-year, students.Gender bias in peer recognition also seems related to student outspokenness (i.e., verbally participating in class).In the two of these four studies that measure outspokenness, a gender bias in students' nominations of strong peers (favoring men) is present when there is also a gender disparity in who is outspoken (i.e., when men participate more than women), and there is no gender bias in peer recognition when there is no gender disparity in who is outspoken [14,30].
Patterns of peer recognition also seem to vary by instructional context, with our previous study of physics courses [14] finding more evidence of gender bias in peer recognition in the context of lecture material than lab material.This result may be attributable to these two instructional contexts covering distinct content and aiming to develop different sets of skills [32][33][34][35][36][37].Indeed, research has shown that students believe lecture skills include knowing mathematics, while lab skills involve handling equipment and using technical skills [18,19].However, the difference in peer recognition across instructional contexts may also be attributable to pedagogy: lectures typically contain many students who focus on the instructor and labs typically contain a small number of students who collaborate on tasks.This variation in how much visibility students have in front of their peers may impact pat-terns of peer recognition, especially during remote instruction as in our prior work [14].In the current study, therefore, we determine the extent to which gender bias in peer recognition exists when lecture and lab material are taught with a similar pedagogical style and in-person.Specifically, we analyze physics courses for first-year students where the lab and lecture material comprise distinct courses (i.e., students co-enroll in one lab course and one lecture course and receive a separate grade in each course), and both of these courses contain a lecture session where all students focus on the instructor and a small-group session where students solve problems or conduct experiments with their peers.
C. What is the nature of this gender bias?
While the quantitative studies mentioned above importantly determine the extent to which a gender bias in peer recognition exists, they do not probe the nature of this gender bias.Toward this end, separate threads of research have started to unpack two possible mechanisms underlying recognition: peer interactions (i.e., students learn about their peers' skills during interactions with these peers) and students' reasons for recognizing others as strong in their physics course (i.e., students recognize their strong physics peers for different skills).Of course, there may also be other explanations for the gender bias in peer recognition, such as sociohistorical gender stereotypes alone, but, in this study, we seek to build on the existing research threads.

Who gets recognized: Peer interactions and indirect observations of peers
Previous work suggests that one mechanism through which recognition forms is interactions with others [15,38,39].In their original conception of the identity framework, Gee states that "the modern need for recognition places a particular importance on discourse and dialogue...Individuals must win recognition for them through exchange with others" [15, p. 112-113].We interpret this to mean that an individual demonstrates their knowledge, skills, and personality traits through conversations with others, who then form perceptions of that individual as a certain kind of person.One study, for example, conducted interviews with undergraduate students to understand their experiences in a remote summer research program [39].The authors found that students' research group members and advisors started to recognize the students as physicists and researchers during interactions with one another: "Other recognition was supported by conversations between the mentee and other group members" [39, p. 10].
Similar work describes peer interactions as a way for students to determine who of their peers is strong in physics.In one study, researchers performed a longitudinal case study of a woman in physics named Cassidy [38].At the beginning of her undergraduate physics studies, Cassidy recognized her more senior peer tutor, one of the only other physics students with whom she interacted, as a "smart" physics student because they showed her an unnecessarily complicated solution to a physics problem.About a year later, however, Cassidy became a more outgoing member of the physics community who frequently collaborated with peers on assignments.These peer interactions facilitated Cassidy's understanding of the "multiplicity of ways to be 'good' at physics," [38, p. 12] such as bringing in different areas of expertise to a peer collaboration.She then recognized many of her peers as being strong physics students, rather than only her peer tutor.This and other studies [15,39], therefore, suggest that interactions are likely a mechanism for forming peer recognition: interactions facilitate students' understanding of their peers' knowledge and skill sets which informs who gets recognized.This mechanism of forming peer recognition may also relate to gender bias in peer recognition because previous work has found that students tend to interact with peers of their same gender [40,41].
Peer interactions, however, are not the only means through which students determine who gets recognized.Grunspan and colleagues [30], for example, demonstrate that outspokenness -frequent verbal participation in front of many others -also relates to which students receive peer recognition.They found that students who actively participated in lecture tended to receive more nominations from peers as strong in their biology course despite these students never directly interacting with one another.In addition to direct interactions with peers, therefore, students may determine who they recognize as a strong peer by indirectly observing their peers.
In the current study, we examine the relationship between peer interactions, indirect observations of peers, and peer recognition by quantitatively comparing students' selfreported peer interactions to their nominations of strong peers.We also compare this relationship across men's versus women's nominations to determine whether patterns of interactions relate to the nature of gender bias in peer recognition (we could not measure whether patterns of indirect observations relate to the nature of gender bias in peer recognition, see Sec.II C 3).

What gets recognized: Skill sets associated with being a physicist
Other studies have explicitly probed the knowledge, skill sets, and traits for which students recognize strong peers in their physics or other science courses [18,19,[42][43][44][45][46][47][48][49][50][51].Doucette and colleagues, for example, asked undergraduate physics students to describe their ideal lab partner [49].The authors identified 13 characteristics from the responses that students recognized in a "good lab partner," including knowledgeable, hardworking, communicative, helpful, and efficient.In another study, Irving and Sayre interviewed upper-level physics students and asked them what they think it means to be a physicist [47].The participants noted a wide array of skills or traits that they recognized in a physicist, including intuition for learning physics, interest in physics, solving physics problems, designing experiments, and collecting and interpreting experimental data.Some studies also relate the identified skills to gender.Danielsson [18], for instance, found that undergraduate students associate natural ability, tinkering with lab equipment, and mathematical competence with men and diligence and note-taking with women.
In the current study, we expand on this body of work by (i) collecting and analyzing a large number of student explanations of their nominations of strong peers and (ii) comparing the frequencies of explanations written by men and women when nominating men versus women to determine whether and how the explanations relate to gender bias in the nominations.

D. Current study
In summary, research has demonstrated that whether a gender bias exists in peer recognition varies across courses and instructional contexts.Prior work also suggests that peer interactions and differences in what skills sets are associated with being strong in physics might help to explain the nature of this gender bias.To probe these two possible mechanisms underlying who and what gets recognized in peer recognition, we conducted a mixed-methods study of in-person physics courses to answer the following research questions: 1. To what extent does a gender bias exist in students' recognition of strong peers within distinct lab and lecture courses?
2. Who gets recognized: In distinct lab and lecture courses, how (if at all) is gender bias in peer recognition related to patterns of peer interactions?
3. What gets recognized: In distinct lab and lecture courses, how (if at all) is gender bias in peer recognition related to the skill sets students associate with being strong in physics?
We collected students' nominations of strong peers, explanations for these nominations, and self-reported interactions with peers in two offerings of distinct introductory lab and lecture physics courses for first-year science and engineering students at Cornell University.Similar to prior research examining introductory biology and physics courses for first-year students [13,14,30], we find a gender bias in peer recognition in which men disproportionately under-nominate women compared to men in all analyzed courses.We also find that women disproportionately under-nominate men in one offering of the lecture course.Comparing the nominations of strong peers to peer interactions, we observe in most cases that the overall gender bias in peer recognition is related to gender bias in interaction-based recognition, where students disproportionately over-nominate their interaction ties to peers of their same gender.Finally, we find a difference in students' written explanations in the lab course, where men nominate men more than women because of the ways they interacted, such as being helpful, but not in the lecture course, where men and women nominate men and women for similar skill sets.

II. METHODS
In this section, we describe the instructional context of our study and then discuss our data collection and analysis methods.

A. Instructional context
The data come from two in-person offerings (fall and spring) of two distinct introductory physics courses (summarized in Table I), one lab course and one lecture course, at Cornell University -a large, private, PhD-granting institution in the northeastern United States with a Carnegie classification of very high research activity.
The lab course focused on developing experimental skills rather than reinforcing physics concepts [see, e.g., [34][35][36][37] and covered topics in both mechanics and electromagnetism.For the lab course, students attended one 50 minute lecture session (instructed by a faculty member of the physics department) and one 2 hour lab session (instructed by a graduate teaching assistant and often a supporting undergraduate teaching assistant) each week.The lecture sessions of the lab course included active learning pedagogies, such as students answering poll questions in small groups.The course was split into two lecture sections per semester, each with 200-300 students in a large stadium-seating lecture hall.During the lab sessions, which contained 20-25 students each, students conducted open-ended investigations in small groups of two to four and each group submitted lab notes at the end of every session to be graded.Lab groups were formed by the teaching assistants based on student preferences from a group-forming survey and remained the same for the whole semester.In forming the groups, the teaching assistants were advised to avoid lab groups containing an isolated woman.Outside of class, students completed individual lab homework assignments using Jupyter Notebook each week [52].There were also multiple office hours per week where students could receive individual help on course content from graduate and undergraduate teaching assistants or the main instructor.
Most students in the lab course were simultaneously enrolled in one of two calculus-based mechanics lecture courses: one intended for physics majors (the "physics majors" course) and one intended for engineering and other science majors (the "non-majors" course).In this paper, we only analyze the non-majors lecture course (200-500 students) because the physics majors course only contained 30 to 50 students.Students in this lecture course attended three 50 minute lecture sessions (instructed by a faculty member of the physics department) and two 50 minute discussion sessions (instructed by a graduate teaching assistant and often a supporting undergraduate teaching assistant) each week.This course used active learning pedagogies including a "flipped classroom" model, such that students read relevant sections of the textbook and took a reading quiz before attending lecture.During lecture sessions, which contained half of the enrolled students at a time (there were two lecture sections per semester) and took place in a large stadium-seating lecture hall, students answered conceptual poll questions in small groups.The course also made extensive use of interactive lecture demonstrations.In the discussion sessions, which contained about 20 students each, students completed physics problems in small groups of two to four but this work was not submitted for a grade.Discussion groups were not formed by the teaching assistants, rather students formed their own groups.Students typically worked with the same discussion group every week.Outside of class, students completed individual homework assignments (problem sets) each week.There were multiple office hours per week where students typically worked together on the homework assignments with the help of graduate and undergraduate teaching assistants.
In this study, all analyzed students in the lecture course were co-enrolled in the lab course.Therefore, it was possible for students to be surrounded by some of the same peers in both courses: the lecture and lab sessions of the lab course and the lecture and discussion sections of the lecture course.Between 20% and 40% of students in the lab course (depending on the semester), however, were not co-enrolled in the lecture course we analyze.

B. Data collection
We administered an online network survey as part of a homework assignment in the lab course in the middle of the 15-week semester (see Fig. 1).On the survey, we distinguished peer recognition in the lab and lecture courses because our prior work identified that patterns of peer recognition varied between these instructional contexts [14].Specifically, we asked students to nominate peers in each course who they believed were knowledgeable about the course material [13,14,30,31] as a measure of their recognition of strong peers.We also asked students to describe why they nominated their peers.
A second set of questions asked students to self-report peers with whom they had meaningful interactions about the instructional material in each course [41,[53][54][55][56].As in prior work, "students self-identified what counted as a meaningful interaction" [56, p. 6].We asked students about whom they interacted with "this week" to capture interactions that students were consistently having with their peers throughout the semester, while reducing the possibility of recall bias (e.g., by asking them to recall all peers with whom they have interacted throughout the semester).This phrasing may have captured a few one-off interactions that only occurred the week of the survey, however these likely represent a small fraction of the reported interactions.
Each question was in an open response format, where students entered each peer's name in a separate text box and the associated explanation for each peer also in a separate text box.Students could enter up to 15 peers' names for each prompt, though no student provided the maximum number of names for any prompt.Students were also given access to the course rosters to facilitate their remembering and spelling of peers' names.Students could nominate anyone in their course; for example, they were not restricted to naming peers in their specific lecture section.
At least 95% of enrolled students in each course responded to the survey (see Table I).Students occasionally misspelled peers' names and/or reported just a first or a last name.In these cases, the first author manually processed the text to match the names in the survey responses to the course roster when possible.We could not match students if the respondent provided only a first (or last) name and multiple students in the course had that first (or last) name and so these responses were subsequently dropped from the data set.In each course, we were able to match at least 90% of the nominations to strong peers and self-reported interactions to the course roster.
Our analysis included all students who responded to the survey and/or were listed by at least one peer on a given survey prompt.Our analysis also included only the nominations and self-reported interactions made by students who consented to participate in research (more than 95% of survey responders).If a consenting student wrote the name of a non-consenting student, we included the survey response, but removed all information (e.g., demographics) about the non-consenting student.We were able to apply social network analysis methods to our data because both the survey response rate and the name matching rate (from the raw survey responses to the course

Recognition network structure
Characterizing patterns of peer recognition with network diagrams and descriptive statistics

Exponential random graph models
Testing variables related to gender bias in peer recognition

Peer interactions
Quantifying the relationship between peer recognition and peer interactions by student gender

Explanations
Categorizing written explanations for nominations and comparing by student gender FIG.2: Flowchart depicting our stages of data analysis.
roster) were at least 90% and very few (<2%) non-consenting students were removed from analysis, and network methods are reliable for data sets with less than 30% missing data [57].
We also collected students' self-reported gender, race or ethnicity, intended major, and academic year on the survey (see Table I).Most students in the data set intended to major in engineering and the majority were in their first academic year.Each offering of the lab and lecture course contained roughly equal proportions of men and women.We grouped race or ethnicity by underrepresented racial minority (URM) status, where non-URM students are those solely identifying as White and/or Asian or Asian American and URM students are those identifying as at least one of any other race or ethnicity (including American Indian or Alaska Native, Black or African American, Hispanic or Latinx, and Native Hawaiian or other Pacific Islander).The majority of students (>60%) in each course were non-URM.Because the role of race or ethnicity was not part of our research questions, this categorization provided a limited ability to control for possible effects of race or ethnicity in our evaluation of the role of gender.We acknowledge, however, the limitations of this categorization [58] and encourage future work to probe this variable explicitly.
At the end of the semester, we collected students' discussion and lab section enrollment, lab groups, and final grades in each course.

C. Data analysis
We conducted our data analysis in four stages (summarized in Fig. 2), largely drawing on methods of social network analysis [55,59,60].

Recognition network structure
We first analyzed the four recognition networks, one for each offering (fall and spring) of each course (lab and lecture), using student responses to the first two questions on the survey (Fig. 1).Similar to prior work [14,30,31], we converted the nominations of strong peers into directed networks (see Fig. 4) to identify broad patterns of peer recognition.Nodes in the network represented students and edges (or ties) in the network represented all nominations made between students (including direction, from the nominator to the nominee).We distinguish non-binary students from men, women, and students of unknown gender in the network diagrams (Fig. 4) to visualize these students' positions in each network.However, non-binary students are not distinguished in the remainder of the analysis because they make up 1% or less of the student population in each course (Table I).
To characterize the structures of the observed networks, we calculated three network-level statistics -density, indegree centralization, and transitivity -for each network.Density is the number of edges in the network that we observed as a fraction of the number of possible edges in the network.Indegree centralization measures the extent to which the nominations are concentrated around a single student or a small subset of students.This measure is calculated as the sum of differences in indegree (number of received nominations) between the node with the highest indegree (receiving the most nominations) and every other node in the network, divided by the maximum possible sum of differences of indegree for all nodes.Higher indegree centralization (i.e., closer to one) indicates higher concentration of nominations around one or a few students (i.e., "celebrities" [30] who receive many more nominations than their peers).Finally, transitivity measures the tendency of nodes to cluster together and is calculated as the proportion of two-paths (two edges connecting three nodes) that have a third edge closing the triangle, not considering edge direction.If node A is connected to node B and node C, for example, an edge between nodes B and C would form a triangle.A higher proportion of such triangles would lead to higher transitivity values (i.e., closer to one).
We determined the standard errors of each of these statistics via bootstrapping: resampling the observed network many times, calculating the statistic of each sampled network, and then determining the standard deviation of the statistic among all of the sampled networks [54,61].The bootstrapping was performed with 5,000 bootstrap trials for each network using the snowboot package in R [62].

Exponential random graph models
We determined the extent to which a gender bias exists in each observed recognition network using exponential random graph models (ERGMs).Such models assume that an observed network is a realization from a random graph that comes from a distribution belonging to the exponential family [63,64].ERGMs allow us to perform many statistical tests at once, determining whether the frequencies of certain con-figurations (e.g., ties between students of the same gender) in our observed network are significantly different than if the ties were formed randomly.The goal is to use these k configurations g k (y) and their corresponding coefficients θ k to predict the formation of the random network Y .The model takes the form where y is a realization of the random network Y and the denominator serves as a normalization constant that ensures that the probability sums to one.Given an observed network y, the coefficients of the model are estimated using Maximum Likelihood Estimation (MLE).Due to the dependence between the network ties, the MLE is commonly approximated with Markov Chain Monte Carlo (MCMC) techniques [65].The coefficients θ k represent log-odds of tie formation and can be interpreted as a weighting of the importance of each modeled configuration for the realized network, where positive (negative) coefficients show that the configuration is observed more (less) frequently than by chance after accounting for all other configurations that are modeled.
In our study, we fit an ERGM to each observed network using a similar set of configurations, or predictor variables, as our prior work [14].For the lab course, we added two new variables.The first variable measured the tendency for students to nominate peers in their immediate lab group given prior work that suggests students often report connections to their group members on network surveys [66].The second variable measured the tendency for students to nominate peers enrolled in their same lecture course (the separate course structure is different than in Ref. [14]).We also only measured discussion section homophily in the lecture course because our prior research found no significant tendency for students to nominate peers in their discussion section as strong in the lab material [14] and the discussion sections are now even further removed from lab material given the distinct courses.Different from our previous work, students received separate final course grades in the lab and lecture courses rather than one overall course grade that encompassed lab and lecture content.Therefore, we used the lab course final grades in the ERGMs for the lab course recognition networks and the lecture course final grades in the ERGMs for the lecture course recognition networks.Finally, we did not include a variable measuring transitivity in the models as we did in previous work [14] because the MCMC MLE did not converge with this variable added.The goodness-of-fit diagnostics, however, showed that our model sufficiently captured the distributions of indegree, outdegree (number of nominations reported by each student), and transitivity for all four observed networks (see Fig. 8 in the Appendix).The following predictor variables were included in our model: • Edges: intercept term equal to the number of edges in the network • Reciprocity: number of mutual nominations (i.e., student A nominates student B and student B nominates student A) • Woman → woman: number of edges for which a woman nominates another woman (base term is man → man) • Woman → man: number of edges for which a woman nominates a man (base term is man → man) • Man → woman: number of edges for which a man nominates a woman (base term is man → man) • URM → URM: number of edges for which a URM student nominates a URM student (base term is non-URM → non-URM) • URM → non-URM: number of edges for which a URM student nominates a non-URM student (base term is non-URM → non-URM) • Non-URM → URM: number of edges for which a non-URM student nominates a URM student (base term is non-URM → non-URM) • Physics majors → physics majors (lab course only): number of edges for which a student in the "physics majors" lecture course nominates another student in the " physics majors" lecture course (base term is non-majors → non-majors) • Physics majors → non-majors (lab course only): number of edges for which a student in the "physics majors" lecture course nominates a student in the "non-majors" lecture course (base term is non-majors → non-majors) • Non-majors → physics majors (lab course only): number of edges for which a student in the "non-majors" lecture course nominates a student in the "physics majors" lecture course (base term is non-majors → nonmajors) • Lab group homophily : number of edges connecting students in the same lab group • Lab section homophily: number of edges connecting students enrolled in the same lab section • Discussion section homophily (lecture course only): number of edges connecting students enrolled in the same discussion section • Grade of nominee: correlation between final course grade and number of received nominations We used the coefficient estimates of the woman → woman, woman → man, and man → woman variables for the four observed recognition networks to determine whether a gender bias exists in student nominations of strong peers after adjusting for the other network configurations included in the model.While we only focus on these gender variables in this paper, we keep the other variables in the model to account for as many different aspects of students' identity and participation in the course as possible and because an exploratory analysis indicated that removing these other variables can change the results for the gender variables [67].In particular, a few of the predictor variables (e.g., lab group homophily, lab section homophily, and discussion section homophily) explicitly control for patterns of student interactions, allowing us to identify whether a gender bias in students' recognition of strong peers exists even after accounting for any strong interaction trends (e.g., gender homophily) [41].Additionally, we note that the final four predictor variables listed above (related to lab group, lab section, discussion section, and grade) cannot handle unknown data.Therefore, only students with known data for these four variables were included in the ERGM analysis.While this predominantly restricted analysis to students who completed the course (i.e., students who did not have a final course grade likely dropped or withdrew from the course after we administered the survey), the ERGM analysis still included more than 90% of students that are part of our overall analysis.Thus, the statistical models provide an accurate description of most students in the class.We recommend for future work to investigate the network positionality of students who do not complete their physics course.
We also note that some sample sizes, particularly for URM students, seem too small to make statistical comparisons with our models (Table I).ERGMs, however, consider edges rather than nodes as the unit of analysis.Though the number of URM students (i.e., nodes) may be small, the networks we study include many of the possible edges between students (Table II and Fig. 4).Smaller sample sizes, furthermore, do not prevent valid estimation of the coefficient values.Instead, they are reflected in the standard errors and p-values of the coefficients [68].Quantitative modifications to ERGMs are only necessary for very small networks (less than six nodes) [69].
We finally note that students' final grades in the lab course were fairly skewed, with many students earning an A or A-.This may introduce issues of range restriction for the grade of nominee term, where low variability in students' final course grades limits the possibility of finding a significant correlation between grades and received nominations.We find in both offerings of the lab course, however, that the model is able to distinguish a significant effect of course grade on peer recognition (Table V).

Peer interactions
To measure the extent to which peer interactions are related to peer recognition, we converted students' self-reported interactions (last two questions on the survey, see Fig. 1) into directed networks.Similar to our analysis of the recognition network structure, we first calculated the density, the proportion of possible edges in the network that we observe, of each interaction network.For each offering of each course, we compared the recognition and interaction network densities to determine whether there were comparable numbers of edges in both networks or if one of the two networks contained many more edges than the other.This comparison was observational and not statistical.
For a more interpretable metric, we calculated the percent overlap, the percent of directed edges in the recognition network that also appear in the corresponding interaction network (see Fig. 3 for example), for each course.This measure allowed us to determine the extent to which peer recognition was interaction-based or observation-based.Higher

Interactions Recognition
FIG. 3: Toy interaction and recognition networks to exemplify how we calculated percent overlap and fraction of interaction network edges kept in the recognition network.
Black edges indicate edges that appear in the interaction network but not the recognition network, blue edges indicate edges that appear in both the interaction and recognition networks (edges kept), and orange edges indicate edges that appear in the recognition network but not the interaction network.In this case, the percent overlap is 1  2 because three out of six recognition network edges also appear in the interaction network.The fraction of interaction network edges kept in the recognition network is 1  3 because three out of nine interaction network edges also appear in the recognition network.
percent overlap values indicate that most of students' nominations of strong peers were interaction-based recognition: many students nominated peers with whom they also reported interacting.For interaction-based recognition, we assume that the nominator came to understand the nominee's skill set through their direct interactions (e.g., talking to immediate group mates during lab).Lower percent overlap values, on the other hand, indicate that most of students' nominations of strong peers were observation-based recognition: many students nominated peers with whom they did not report interacting.For observation-based recognition, we assume that the nominator came to understand the nominee's skill set through their indirect observations of them (e.g., seeing someone frequently participate in lecture) rather than direct interactions.We also calculated gender homophily as the percent of edges in the interaction network where both the nominator and the nominee are of the same gender (i.e., edges from men to men and edges from women to women).Research has shown that gender homophily is prevalent in student interaction networks [40,41,70], thus this measure helped us understand patterns between the interaction and recognition networks related to gender, discussed next.
Finally, we compared the extent to which any gender bias in peer recognition was specifically related to a gender bias in interaction-based recognition (i.e., if there was a gender bias in which of students' interaction ties they also nominated as strong in the course).This analysis only included edges for which both the nominator and the nominee self-reported their gender as either man or woman (859 out of 1,000 total nominations of strong peers and 1,590 out of 1,789 total self-reported interactions across all four courses).For every possible combination of men and women nominating each other (i.e., man nominating a man, man nominating a woman, woman nominating a man, and woman nominating a woman), we calculated the fraction of interaction network edges kept in the recognition network as the number of directed edges that appear in both the interaction and recognition networks divided by the number of directed edges in the interaction network (see Fig. 3 for example).We compared this measure by student gender in each network.We did not perform statistical tests because the goal of this analysis was to determine large-scale trends in the measure and relying on p-values can be problematic [71][72][73].Statistical tests of distinguishability would involve many comparisons that increase the risk of finding apparent statistical significance due only to chance.Instead, we use overlap in error bars (given by standard errors) to make qualitative interpretations about differences in the measure between men and women and do not comment on the possible distinguishability of small effects.While this approach is more appropriate than statistical testing, we acknowledge that using error bars may come with its own set of limitations [74].
We note that we could not measure the extent to which any gender bias in peer recognition was related to a gender bias in observation-based recognition.An appropriate measure of such a bias would be to calculate the fraction of peers that students indirectly observed, but did not interact with, that they nominated as strong in the course material.However, we did not collect data about which peers students indirectly observed -all we know is who they do not interact with.In large classes, such as the ones we analyze here, it is not reasonable to assume students indirectly observed all of the peers with whom they did not interact.We recommend for future work to investigate gender bias in observation-based recognition by examining peer recognition in small courses (where it is reasonable to assume students have the chance to indirectly observe all of their peers) or by also collecting data about peers with whom students are familiar but did not directly interact.

Explanations
We conducted a thematic coding analysis of student responses to the survey prompt, "Please briefly explain why you chose this student as strong in the course material," to identify what gets recognized in peer recognition -that is, the skill sets for which students recognized their strong physics peers.The first author initially read all responses to gain a sense of the data as a whole [75].Upon recognizing similar themes to those identified in prior research [18,19,[42][43][44][45][46][47][48][49][50][51], the first author drafted an a priori codebook informed by these themes.The research team then iteratively developed this codebook by individually coding a subset of the data and then meeting to modify the code definitions based on coding disagreements [76].We also grouped together similar codes into four overarching categories: knowledge, processes, inter-actions, and other.These categories were inspired by those in Ref. [77], in which the authors categorized students' problem solving skills as related to knowledge, processes, and beliefs.
After the coding scheme was agreed upon, three members of the research team coded a stratified random sample of 10% of the 1,000 total explanations in our data set.We stratified the random sample by course (lab and lecture) because the two courses had different learning objectives and course structures, which might have led students to associate different skill sets with being strong in each course.Therefore, half of the random sample contained explanations from the lab course and the other half contained explanations from the lecture course.We determined interrater reliability by calculating Fuzzy Kappa [78] between each of the three pairs of coders because each explanation could receive multiple codes.All three pairwise Fuzzy Kappa values were greater than 0.8 indicating sufficient interrater reliability [78].After reaching this level of agreement, the first author coded the remaining explanations.
We then compared the fractions of nominations between every possible combination of men and women nominating each other (i.e., man nominating a man, man nominating a woman, woman nominating a man, and woman nominating a woman) containing each code.This comparison only included explanations for which both the nominator and the nominee self-reported their gender as either man or woman (859 out of the 1,000 total explanations across all four courses).We also aggregated the data from the fall and spring offerings because the results for each individual offering were not substantially different from the aggregated results and the larger data set reduces possible statistical noise.Similar to our explanations analysis, we did not perform any statistical tests because the goal of this analysis was to identify any large-scale differences in these fractions.Instead, we used overlap in error bars (given by standard errors) to make qualitative interpretations about differences in code frequencies between men and women.This comparison allowed us to determine whether and how any gender bias in peer recognition in each course is related to what gets recognized (i.e., students nominating one another for different skill sets).

III. RESULTS
We present the results from each of the four stages of analysis (Fig. 2): recognition network structure, ERGM analysis, comparison to interaction networks, and explanations of nominations of strong peers.

A. Recognition network structure
The structural features of the observed recognition networks, summarized in Table II and shown in Fig. 4, provide information about broad patterns of recognition.We observe that the densities of the lab and lecture recognition networks are similar within each semester (fall and spring).Because there are many more nodes in the lab recognition network than the lecture recognition network in each semester (Table II), the densities suggest that there is a higher level of connectedness in the lab networks than the lecture networks (i.e., a higher proportion of possible edges exist in the networks in the left column of Fig. 4 than in the corresponding network in the right column).We also observe relatively low indegree centralization values in each network.This observation indicates that the nominations are fairly spread out among the students in a course rather than concentrated around only one or a few students.Correspondingly, there are no outstanding "celebrities" in any network; no nodes are much larger than the rest, which would be associated with receiving many nominations (Fig. 4).While there is one man in the fall lecture network who receives more nominations than anyone else (seven), outstanding celebrities in the large science courses analyzed in prior work receive more than 30 nominations [14,30].
We also observe that all four networks contain one or two relatively large components that connect many nodes along chain-like formations and many smaller components of two to four nodes that are only connected to each other.The prevalence of these smaller components, however, is stronger in the lab course than the lecture course as indicated by the higher levels of transitivity.This pattern is likely due to the lab course placing more emphasis on small group work during the lab sessions (e.g., coordinating experimental investigations and submitting the lab notes for a group grade) than the lecture course does during the discussion sessions (e.g., collaborating on problems but not submitting work for a grade).We also see a large fraction of isolated nodes (at least 30% of the total nodes in each network), representing individuals who responded to the survey but did not nominate any peers as being strong in the course material and who were also not nominated by any other students.The demographics of the isolated nodes (e.g., gender and race or ethnicity) in each network are proportional to the demographics of the course population.

B. Exponential random graph models
As per our research questions, we focus on the coefficient estimates of the ERGM terms that speak to the extent to which a gender bias exists in the recognition networks (see Fig. 5 and Table V in the Appendix).We find that in both offer-  ings of the lab course, women proportionately nominate men and women as strong in the course material as compared to men nominating men (red and orange dots on the left panel of Fig. 5).Women also proportionately nominate men and women as strong in the course material as compared to men nominating men in the fall offering of the lecture course (orange dots on the right panel of Fig. 5).Women nominate men less frequently than men nominate men, however, in the spring offering of the lecture course.Additionally, in both offerings of both courses men disproportionately under-nominate women as strong in the course material as compared to men nominating men (pink dots on both panels of Fig. 5).Although comparing ERGM coefficient values across different-sized networks is ill-defined [79], we tentatively observe that this bias from men occurs to a similar extent in every course, with the possible exception of the spring offering of the lab course which has a slightly smaller coefficient estimate for the man → woman variable.
In all cases, the comparisons are made after adjusting for the other variables in our model.We particularly note that these results related to gender bias hold even after controlling for the gender composition of lab groups (lab group homophily variable), which were intentionally made to avoid isolated women, and other patterns of student interactions (e.g., lab section homophily and discussion section homophily variables).In the lab course, the results also hold after controlling for any bias based on lecture course enrollment (physics majors → physics majors, physics majors → non-majors, and non-majors → physics majors variables).While the nonmajors lecture course is fairly gender balanced (Table I), the physics majors course contains a majority of men (70-80%, depending on the semester).A bias in which nominations favor students in the physics majors course, therefore, would make men more likely to be nominated than women.The gender bias we observe, however, is present even after controlling for the lecture enrollment variables included in our model (and thus the different student populations in the two lecture courses).

C. Peer interactions
To further understand how students identify who to recognize as strong in their physics course, we also analyzed students' self-reported interactions.The interaction network dia-FIG.6: Fraction of edges in each interaction network that also appear in the corresponding recognition network for each combination of nominator and nominee gender.Error bars represent standard errors of the proportions.
grams are shown in the Appendix (see Fig. 9).
Comparing the densities of the interaction networks (Table III) to the densities of the corresponding recognition networks (Table II), we find in all four courses that the interaction network is more dense (i.e., students are more connected by edges) than the recognition network.This comparison may suggest that when making their nominations of strong peers, students select a subset of the peers with whom they interact to nominate as strong in the course material.
The percent overlap values, the percent of edges in the recognition network that are also in the corresponding interaction network, however, indicate that interactions only account for a little more than half of the nominations of strong peers in each recognition network (Table III).That is, students also nominate peers with whom they did not report interacting.These results indicate that interaction-based recognition (e.g., learning about peers' skills by working together on a problem set or discussing concepts together in lecture) and observation-based recognition (e.g., learning about peers' skills by seeing a student ask a question in class or watching a student in a nearby lab group collect data for an experiment) occur with similar frequencies in the observed courses.
We examined the extent to which the gender biases in peer recognition identified in the ERGMs (in which men undernominate women in all courses and women under-nominate men in the spring lecture course) are related to gender biases in the interaction-based nominations.We identified the fraction of students' interaction ties that they also nominate as strong in the course, split by gender of the nominee (Fig. 6).In three out of the four courses (all but the spring offering of the lab course), we observe that men disproportionately "keep" more of their interaction ties with men than with women (see top panel of Fig. 6).This pattern indicates that the gender bias coming from men that we observed in the ERGM analysis for these courses is (at least partly) related to a gender bias in interaction-based recognition.Men exhibit strong strong gender homophily in their peer interactions (Table III) and, after adjusting for this homophily (i.e., the metric shown in Fig. 6 is normalized by the number of interactions made to peers of a given gender), men exhibit a gender bias in which of their interaction ties they select as strong in the course material.
In the spring offering of the lab course, in contrast, men proportionately nominate their interaction ties to men and women as strong in the course (see top panel of Fig. 6).The gender bias in peer recognition coming from men in this course, therefore, is likely related to a gender bias in observationbased recognition, though we could not measure such a bias in our study (see Sec. II C 3).
Finally, the gender bias in the spring lecture course, in which women under-nominate men as strong in the course, is (at least partly) related to interaction-based recognition: women disproportionately nominate more of their interaction ties with women than with men (see bottom panel of Fig. 6).In all other semesters, where we do not observe a gender bias coming from women in the ERGMs, we correspondingly find that women nominate proportional numbers of men and women from their interaction ties as strong in the course material.

D. Explanations
We devised a coding scheme characterizing students' written explanations of their nominations of strong peers to determine what gets recognized in peer recognition -that is, what are the specific skill sets that students associate with being strong in their physics course (see Table IV).The coding scheme illuminates the features of peer interactions and indirect observations of peers that students consider when selecting which of their peers to nominate.
Related to knowledge, students describe strong peers as those who have understanding of the course material and have high performance (e.g., earn high grades).Students also mention that the peers they nominate have experience or background knowledge relevant to the course and have a natural ability for learning physics.Nominees were also described as hard-working or having motivation to learn the course material.
Students identify multiple processes associated with being strong in their physics courses.These codes are coursespecific.In the lab course, students describe strong peers as those who carried out the data analysis for their experiment, contributed to the planning or experimental design, partici- pated in data collection, and engaged in writing up the lab notes.In the lecture course, students acknowledge the problem solving abilities of their strong peers.
Students also consider features of their interactions with others in the course when forming perceptions of their peers.Some students describe strong peers as helping them learn the course material and explaining the course material to others.Students also describe nominees as having high levels of verbal participation during lectures or group work and leading group work during in-class activities.
Finally, some students note other reasons that they viewed their peers as strong in the course material.These explanations are either too vague to associate with one of the above codes or too infrequent to create a separate code.Explanations left completely blank are coded as none.
Students proportionately recognize their interaction-based and observation-based nominees for each of these skills, except for process-related codes.In the lab course, students are more likely to describe their observation-based nominees than their interaction-based nominees as being strong in processes (specifically, data analysis).In the lecture course, students are more likely to describe their interaction-based nominees than their observation-based nominees as being strong in processes (specifically, problem solving).More detail about this comparison can be found in the Appendix (see Fig. 10).
Comparing fractions of nominations containing each code by the nominator and nominee gender, we find that in both the lab and lecture courses women proportionately nominate men and women for each category of the coding scheme (bottom row of Fig. 7a), with small differences for most individual codes (bottom row of Fig. 7b).One interesting exception is that women disproportionately nominate men more than women for understanding in the lab course, though women do not exhibit a gender bias in their nominations in either offering of this course (Fig. 5).
In the lecture course, men also proportionately nominate men and women for each category despite the gender bias identified in the ERGMs (top right box of Fig. 7a).In the lab course, however, men nominate men more than women for features of their interactions (top left box of Fig. 7a).Examining the individual codes, we see that men disproportionately over-nominate men as strong in the lab course for two of the individual interactions codes (top row of Fig. 7b) -helping and explaining, -with the largest difference occurring for helping.It is important to note that these comparisons for individual codes are limited by small sample sizes (Table IV).
These results add nuance to our previous stages of analysis.The gender bias in the lecture course is seemingly related to who gets recognized (i.e., students disproportionately overnominating interaction ties to peers of their same gender) and not what gets recognized.In the lab course, however, the observed gender bias is also due to what gets recognized: men nominate more men than women because of the ways they interacted.

IV. DISCUSSION
In this study, we aimed to disentangle who and what gets recognized in peer recognition to better understand the nature of gender bias in such recognition (identified in, e.g., Refs.[13,14,30]).Across two offerings of distinct lab and lecture physics courses, we found that students determine who gets recognized in peer recognition in two ways, each with a similar frequency: interacting with peers and indirectly observing peers with whom they do not interact.We also identified what gets recognized in peer recognition: students mention skill sets related to knowledge, processes, and interactions in their written explanations of nominations.
In the following sections, we synthesize these results related to gender bias in peer recognition (in which men disproportionately under-nominated women as strong in both courses and women disproportionately under-nominated men as strong in one offering of the lecture course) and relate our findings to prior work.We also discuss other implications for research suggested by our analysis.

A. The nature of gender bias in peer recognition
In both offerings of the lecture course, we found a similar gender bias in peer recognition (in which men undernominated women compared to men) to previous work examining science courses aimed at first-year students during in-person and remote physics courses [13,14] and in-person biology courses [30].Such a bias was not previously observed in in-person mechanical engineering courses [31] and remote physics courses [14] aimed at beyond-first-year students, which is likely due to students in those courses being more familiar with each other, as noted in Ref. [14].Different from prior work [13,14,30], we also observed a bias from women in the spring offering of the lecture course: women disproportionately under-nominated men as strong in this course.Surprisingly, we also found a gender bias in peer recognition (in which men disproportionately under-nominated women) in both offerings of the lab course even though we did not previously observe a gender bias within a comparable lab context of a remote physics course (Course A in Ref. [14]).We build on the body of previous work by also probing the nature of these gender biases in peer recognition, finding that whether the gender bias is related to who and/or what gets recognized varies by instructional context, whether lecture or lab, described next.Our analysis suggests that in both offerings of the lecture course, the gender bias in peer recognition coming from men's nominations is related to men disproportionately overnominating men with whom they interact as compared to women with whom they interact as strong in the course material (i.e., there was a gender bias in interaction-based recognition).We observed that this bias is likely not attributable to men recognizing men and women for different skill sets because, of the people they recognize, men nominated men and women for similar skill sets.
The latter result is somewhat surprising given the research literature showing that students associate men and women with having different skill sets in lecture [44,46,49].One study, for example, found that students associate men more than women with having a natural ability for learning physics and associate women more than men with asking questions [44].If our observed gender bias in the lecture course is instead due to who gets recognized, rather than what gets recognized, then the observed bias may be due to gender stereotypes more broadly (i.e., students generally associating men more than women with being strong in physics) [3,7,9,[17][18][19][20][21][22][23][24][25][26].Alternatively, this gender bias may be related to prior literature's suggestion that students' social networks, and subsequently peer perceptions, exhibit strong gender homophily [40,41,70].This interpretation is supported by our analysis of women's nominations, described next.
We found that women under-nominated men in the spring offering of the lecture course, opposite of what prior work would predict (i.e., a gender bias in peer recognition in favor of men, rather than against men) [13,14,30].In this offering, the observed bias was related to women nominating more of their interaction edges to women than men as strong in the course.Similar to nominations from men in this course, women proportionately nominated men and women for the skills identified in our analysis.The gender bias in the lecture course coming from both men and women, therefore, is related to who gets recognized (particularly, students nominating their same-gender interaction ties) and not what gets recognized.

Gender bias in the lab course is related to both who and what gets recognized
We found that the gender bias in the lab course (in which men disproportionately under-nominated women as strong in the course) is related to a gender bias in interaction-based recognition in the fall offering (similar to the patterns observed in the lecture courses described above), but is likely related to a gender bias in observation-based recognition in the spring offering (though we could not measure this).Different from the lecture course, the nature of peer interactions are also a possible source of the gender bias in the lab course: men nominated men more than women for skills related to their interactions, such as helping them with the course material and explaining course material to others more generally.This result is consistent with prior work proposing that the "chilly climate" for women in physics may be due to the nature of their peer interactions rather than their number of peer interactions [66].It is surprising, however, that the men and women in our study proportionally nominated one another for specific experimental skills in lab, such as handling the equipment to collect data or leading the group, despite evidence of students' gendered engagement in these roles [18,19,80].
One possible explanation for why we observed a gender bias (from men) in the in-person lab course but not the remote labs (Course A in Ref. [14]) is that these two labs had very different course structures, which ultimately shaped opportunities for peer interactions and observations.In the remote course, the lab was attached to the lecture course and almost all work for lab was done during the lab sessions within lab groups in isolated breakout rooms on Zoom.The lab had no whole-class lecture session or summative assessments (e.g., exams or quizzes) and students' lab grades were combined with their lecture course grade.The in-person lab course, in contrast, included a whole-class lecture session that included peer instruction activities and three summative quizzes.Students also received a separate course grade for the lab material.We suspect that these changes in course structure fa-cilitated more out-of-lab-group interactions, including both in-lecture interactions and out-of-class interactions.Out-ofclass interactions may also have been more common during the in-person course simply due to the evolving nature of the COVID-19 pandemic.

Summary
The results from both courses add nuance to our previous work which found that the presence or absence of a gender bias in peer recognition varies between the instructional contexts of lab and lecture [14].In this study, we observed a gender bias in peer recognition (either from both men and women or just men) in both the lab and lecture course but found that the sources of this bias varied -whether only due to a bias in who gets recognized (lecture) or also due to a bias in what gets recognized (lab).These results may suggest that pedagogical style (e.g., having both lecture and small-group sessions related to instructional material) may impact gender bias in peer recognition more than the instructional material itself.Future investigations of peer recognition, however, should continue to analyze peer recognition in these instructional contexts separately and should examine the nature of gender bias in peer recognition in other lab and lecture contexts, such as those at other institutions, with students from other types of majors, or with students in studio physics courses.Researchers should also more closely investigate how students develop perceptions of strong physics peers, such as through interviews, that may better illuminate some of the differences we observed between courses and semesters and between men's and women's nominations.

B. Other implications for research
Here we synthesize our findings that are not directly tied to the research questions of this study with those of previous research and suggest directions for future work.

Structure of peer recognition networks
The structures of the peer recognition networks in this study differ from those observed in prior work.Specifically, a relatively small fraction of nodes in the recognition networks of the in-person lecture courses in this study comprised the giant component (the largest cluster of connected nodes), with connections instead forming short chain-like structures and some small, disconnected components.In previous studies [14,30,31], in contrast, the lecture recognition networks for both in-person and remote science courses had relatively large giant components (i.e., a large proportion of the nodes connected together in a main cluster).We propose that this difference in network structure is due to differences in course structure and populations.Refs.[30,31], for example, examined second-semester (or later) in-person courses within a multi-course sequence, when students have likely developed familiarity with one another.Thus, students could likely identify many of their peers, including outspoken students in lecture who then become celebrities in the network.Similarly, in the remote physics courses studied in Ref. [14], Zoom likely expedited name familiarity and connections across many different peers (not just those in close physical proximity) even for students in the first course of the sequence.The in-person lecture courses analyzed in this study, however, are the first in the physics sequence when students are likely still getting to know one another and each other's names (e.g., even if some students are really outspoken during in-person lectures, other peers likely do not know their name).Future work should examine this effect of peer familiarity on peer recognition further through a longitudinal study, for example modeling how recognition networks change (or not) throughout a course sequence (e.g., similar to Ref. [81] in which the authors analyze changes in peer interaction networks over time).
We also observed lower transitivity (small group clustering) in the peer recognition networks of in-person lab courses (this study) than the peer recognition networks of remote physics labs [14].This difference is likely due to the different course structures mentioned above and the different instructional modalities.In the remote physics labs [14], students only attended low-enrollment lab sessions where they worked in Zoom breakout rooms with the same peers every week.Therefore, students had limited access to peers (and these peers' names) outside of their immediate lab group.Students in the in-person lab course (this study), however, could see and interact with other students in their lab section, including those outside of their immediate lab group.These students also attended a large lecture each week where they could interact with peers outside of their lab group and lab section entirely.Increased visibility of out-of-lab-group peers within individual lab sections and the addition of a lecture session to the lab, therefore, likely increased the number of peers students had access to which in turn reduced the amount of small, isolated clusters in the recognition network.Future work should investigate peer recognition in other course structures containing structured small group activities, such as studio physics and modeling instruction, where patterns of recognition might also vary.

How does peer recognition form?
We found that both peer interactions and indirect observations of peers play an important role in shaping peer recognition, in line with what prior work suggests [15,30,38,39].Our analysis, therefore, illuminates the complexities behind how students form perceptions of their peers (i.e., interactionbased versus observation-based recognition).Future work should further investigate the relationship between interactions and peer recognition, for example by conducting egocentric network analyses to probe the more fine-grained social processes underlying the development of peer recognition.Our results also suggest that future analyses of peer recognition should continue to probe and analyze interaction networks alongside student nominations of strong peers.Ex-amining both networks together will likely provide a more nuanced understanding of how patterns of recognition form.
Finally, we devised a coding scheme to identify the different skill sets for which students nominate their peers as strong in their physics course.These codes encompassed a variety of skills students identified in their explanations of nominations: knowledge (e.g., content knowledge, getting high grades, and having a natural ability to learn physics), engagement in processes (e.g., designing an experiment, analyzing data, and solving problems), and interactions (e.g., helping, explaining, and leading).Such skills have been identified across qualitative research studies investigating how students define being a good physics student or a good physicist [18,19,[42][43][44][45][46][47][48][49][50][51], however our study is the first to evaluate these skills at the scale of a large introductory physics course.Future work should examine whether this coding scheme is applicable to explanations from students in other institutions or experiencing different instructional styles and pedagogies.

C. Limitations
We end this section by acknowledging the limitations of our study.First, the online network survey may not have captured all nominations of strong peers and all peer interactions.Students may not have remembered the names of individuals they perceived as strong in the material and/or with whom they interacted, for example due to recall bias.We also only collected survey responses in the middle of the semester.Previous work administered surveys either both at the middle and end of the course [30] or only at the end of the course [31] and so this methodological choice allowed us to compare our results to that previous work.Future research examining inperson physics students' recognition of strong peers at multiple points in their physics course, or just at the end of their physics course, however, may add nuance to our results.
Additionally, we categorized students' race or ethnicity by URM status because the number of students in some of the individual racial or ethnic groups was too small for our statistical analysis to produce useful and interpretable results.This treatment of race or ethnicity, however, inevitably masks differences in peer recognition between students of individual racial or ethnic groups.Because our research questions were focused on the role of gender in peer recognition, this categorization allowed us to account for race and ethnicity in a cursory way in our analysis.Future work should seek to study the role of race and ethnicity in peer recognition explicitly by using more diverse student populations with statistically sufficient representation across racial or ethnic groups.Researchers should also aim to differentiate peer recognition among White and Asian students and between Asian and Asian American students [82,83].
We also observed more sparse recognition networks than those in prior work [14,30,31], particularly with regard to the proportion of students who were isolates (nodes with zero adjacent edges in the network).It is impossible for us as researchers to know if these isolates are "true isolates," students who truly have zero nominations to make or receive, or if they are essentially non-respondents who fill out the survey quickly and do not nominate anyone, even though they may actually have nominations to make.Therefore, our analysis may have missed some nominations and the network structures may not represent all connections between students.Fortunately, social network data is robust to up to 30% missing data [57] and our recognition networks contain a range of 32%-48% isolates.Assuming some students are true isolates, our analysis likely captures a fairly accurate picture of peer recognition.Furthermore, all of our ERGMs converged with appropriate goodness-of-fit diagnostics, indicating that the statistical analysis is reliable; any uncertainties due to small sample sizes are reflected in the standard errors and p-values of the coefficient estimates.
Finally, the majority of our analysis assumes that any overlap in the recognition and interaction networks implies that peer recognition was formed through peer interactions.Alternatively, interactions may plausibly be formed through recognition, such as by a student deciding to interact with peers they perceive as strong in the course material.We believe the former is more likely than the latter in this study because the courses we analyze are the first in the course sequence, when students have likely not met each other before, and because the survey was administered in the middle of the semester, when students may not have a strong sense of each other's grades.Nonetheless, future work should seek to disentangle this relationship between interactions and recognition, if possible.

V. CONCLUSION
In this study, we observed a gender bias in student nominations of strong physics peers in distinct lab and lecture courses.In both courses, men under-nominated women as strong in the course.Women also under-nominated men as strong in one offering of the lecture course.To understand the nature of this gender bias in peer recognition, we built upon two existing threads of research.First, we investigated the role of peer interactions in students' determination of who gets recognized in peer recognition.Second, we examined what gets recognized -whether the skills students associated with being a strong physics student or strong physicist are related to the gender bias in peer recognition.We found that roughly half of nominations of strong peers formed through peer interactions, with the remaining nominations likely coming from indirect observations of peers.We also observed that the nature of the gender bias varied between the lab and lecture courses.In the lecture course, the bias was related to who gets recognized: both men and women disproportionately over-nominated their interaction ties with students of their same gender as strong in the course material.At the same time, men and women nominated men and women for similar skill sets in this course.In the lab course, in contrast, men also disproportionately undernominated women for certain skill sets, particularly those related to their interactions, such as being helpful.These results add nuance to our understanding of how students form perceptions of strong peers in their physics courses and prompt fu-ture work to examine the nature of gender bias in peer recognition in other course structures, instructional styles, and student populations.• revising their own thinking (e.g., "She understands and is confident in the course material and can iteratively revise her thinking.") • creative (e.g., "He quickly comes up with good ideas for experiments and is very creative with collecting data.") • hearing from someone else that the nominee is strong in the course (e.g., "I've heard great things of this man.").

E. Explanations split by interaction-based and observation-based nominations
Figure 10 shows the fraction of interaction-based nominations (i.e., nominations of strong peers with whom the nomi-nee also reported interacting) and observation-based nominations (i.e., nominations of strong peers with whom the nominee did not report interacting) falling under each category or code from our coding scheme.

FIG. 4 :
FIG.4: Recognition networks for all analyzed courses.Nodes are colored by gender and sized proportional to indegree (number of received nominations as strong in the course).Edges point from the nominator to the nominee.

FIG. 5 :
FIG. 5: Coefficient estimates, represented as log-odds, for the gender variables of the exponential random graph models (values shown in Table V in the Appendix).The base term (i.e., coefficient estimate of zero) is nominations from man to man.Error bars represent standard errors and asterisks indicate statistical significance.

FIG. 7 :
FIG.7: Fraction of nominations within each gender combination falling under each (a) category of our coding scheme within each course and (b) code of our coding scheme for the lab course only, split by gender of the nominator and nominee.Results are aggregated across the fall and spring offerings of each course.We coded 300 (205 in lab and 95 in lecture), 106 (79 in lab and 27 in lecture), 179 (127 in lab and 52 in lecture), and 274 (168 in lab and 106 in lecture) explanations from man to man, man to woman, woman to man, and woman to woman, respectively.Fractions do not necessarily sum to one because each explanation could receive multiple codes.Nominations made by men and women received comparable numbers of codes: men's nominations received an average of 1.4 codes and women's nominations received an average of 1.3 codes.Error bars represent standard errors of the proportions.

1 .
Gender bias in the lecture course is related to who gets recognized, not what gets recognized

FIG. 10 :
FIG. 10: Fraction of nominations that are interaction-based and observation-based falling under each (a) category and (b) code of our coding scheme, split by course.Fractions do not necessarily sum to one because each explanation could receive multiple codes.Error bars represent standard errors of the proportions.

TABLE I :
Summary of survey response rates and self-reported student demographics for the four courses we analyzed.All analyzed students in the lecture course are also in the lab course of the corresponding semester.Percentages are relative to the number of students included in the analysis unless specified otherwise.We grouped race or ethnicity by underrepresented racial minority (URM) status, where non-URM students are those solely identifying as White and/or Asian or Asian American and URM students are those identifying as at least one of any other race or ethnicity (including American Indian or Alaska Native, Black or African American, Hispanic or Latinx, and Native Hawaiian or other Pacific Islander).We denote students' gender and race or ethnicity as "unknown" if they preferred not to disclose this information on the survey or if they did not complete the survey.

Please list the students in this physics lab class that you think are particularly strong in the course material. Please list any students in your physics lecture class that you think are particularly strong in the course material. Please list any students in this physics lab class that you had a meaningful interaction* with about course material this week. Please list any students in your physics lecture class that you had a meaningful interaction* with about course material this week.
A meaningful interaction may mean in class, out of class, in office hours, virtually over zoom, through remote chat or discussions boards, or any other form of communication, even if you were not the main person speaking or contributing. *

TABLE II :
Network-level statistics for the observed recognition networks.Standard errors of the last digit are shown in parentheses.

TABLE III :
Densities and gender homophily of the observed interaction networks and percent overlap of the recognition and interaction networks.Standard errors of the last digit of the densities are shown in parentheses.

TABLE IV :
Definitions and examples of our coding scheme for students' explanations of why they nominated their peers as strong in the course material.Some codes were only present in either lab or lecture course nominations, indicated in parentheses.N indicates the total number of occurrences of each code in each course, lab or lecture.A more detailed description of explanations coded as Other, with examples, is provided in the Appendix.

TABLE V :
Coefficient estimates for the exponential random graph models, represented as log-odds, fit to each observed recognition network.Standard errors are given in parentheses.Asterisks indicate statistical significance ( * p <0.05; * * p <0.01).