Analysis of Praxis physics subject assessment examinees and performance : Who are our prospective physics teachers ?

Lisa Shah, Jie Hao, Christian A. Rodriguez, Rebekah Fallin, Kimberly Linenberger-Cortes, Herman E. Ray, and Gregory T. Rushton Department of Chemistry, Stony Brook University, Stony Brook, New York 11794, USA Department of Statistics and Analytical Science, Kennesaw State University, Kennesaw, Georgia 30144, USA Department of Chemistry and Biochemistry, Kennesaw State University, Kennesaw, Georgia 30144, USA Analytics and Data Science Institute, Kennesaw State University, Kennesaw, Georgia 30144, USA Institute for STEM Education, Stony Brook University, Stony Brook, New York 11794, USA


I. INTRODUCTION
Teachers have a significant impact on the success of their students in the classroom and beyond into the workforce, as the effectiveness of STEM teachers is particularly critical for preparation of highly qualified STEM professionals.Studies have reported strong correlations between teacher qualifications, which are often used as proxies for assessing effective teaching, and student achievement in STEM [1][2][3].In-field degrees (i.e., those consistent with the subject being taught) have been among the most impactful teacher qualifications, as they are typically an indication of having had extensive academic training in a subject.What and how much a teacher understands about a subject is reasonably expected to influence their ability to teach it effectively [4][5][6].
In recognition of the importance of subject matter knowledge in effective teaching, one of the final requirements on the pathway to becoming a beginning teacher is often the demonstration of content knowledge proficiency on a subject certification exam.In the United States, the Praxis physics subject assessments is one such exam administered in 36 states and Washington D.C. over the past decade [7].The 100-question multiple choice assessment includes topics commonly encountered in an introductory college physics course, including mechanics; electricity and magnetism; optics and waves; and heat, energy, and thermodynamics [8].To be recommended for certification, prospective teachers must demonstrate some minimum level of proficiency as designated by content experts and state officials [9].

II. BACKGROUND AND PRIOR RESEARCH
Previous studies of physics teacher content knowledge have established the centrality of subject expertise in effective teaching.These studies have examined physics teachers' topic-specific content knowledge [10][11][12] and its effect on their self-confidence in teaching physics [13].Knowledge of physics has been identified as an important foundation for being able to appropriately relay this knowledge to students [4][5][6].While these studies have highlighted the importance of strong teacher content knowledge, to what extent incoming physics teachers in the U.S. demonstrate understanding as they prepare to lead students in physics classrooms has not yet been investigated.Such an assessment would serve to evaluate the current state of physics education and identify ways to improve it.
Additionally, a number of recent reports from the physics education community have generally concluded that the teaching workforce suffers from a lack of diversity with respect to gender [14,15], though far fewer studies have acknowledged the lack of racial diversity [16].Several reports by the American Institute for Physics (AIP) have found (i) lagging participation of African American, Black, and Hispanic high school students in physics relative to other races or ethnicities [17], (ii) no detectable changes in the participation of African American and Black students in undergraduate physical sciences and engineering programs [18], and (iii) and a recent decline in the already low percentage of women and African American and Black students pursuing physics degrees [19].Furthermore, our work and that of others in this area has also built on these findings, establishing that the diversity of the high school physics teaching workforce has not changed appreciably in the past two decades with respect to race and ethnicity (i.e., 90% white in 1987 to 87% white in 2011) and has made just minor improvements with respect to gender equity (i.e., 20% female in 1987 to 32% female in 2011) [20][21][22].However, evaluations of the diversity of prospective physics teachers (i.e., aspiring professionals) relative to the diversity of the resulting workforce have not yet been conducted.Insights from these analyses should help inform future policy and initiatives aimed at improving the diversity of the physics teaching workforce.

III. STUDY CONTEXT AND RESEARCH QUESTIONS
The goal of the presented work was to understand the demographics of prospective high school physics teachers in the United States in the past decade and the extent to which these test takers with different personal and professional characteristics demonstrated knowledge of the subject prior to beginning their first physics teaching assignment.We report findings from our analysis of Praxis physics subject assessments data spanning past ten years to (i) characterize examinees by their personal and professional demographics, and (ii) evaluate their exam performance in relation to these demographic factors.The presented work is guided by the following research questions: 1. What have been the personal and professional characteristics of those who have taken the Praxis physics subject assessments in the past decade?
2. How have these personal and professional characteristics correlated with Praxis physics subject assessment performance in the past decade?

A. Praxis physics subject assessments
The Praxis subject assessments are a series of exams intended to assess the subject-specific content knowledge and related skills of beginning K-12 teaching candidates.
Those pursuing careers as teachers take the exam as part of the certification process in the majority of U.S. states (currently 36 states and Washington D.C., see Appendix).Exam questions are prepared by panels of expert educators, teacher preparation faculty, and subject specialists and subsequently reviewed by ETS for validity, reliability, and issues of bias before administration [9].Once examinees have taken the exam, passing scores are set at the state level (described in more detail in Sec.C).

B. Study sample
All Praxis subject assessments examinees between the ages of 18 and 75 who took the exam sometime between June 2006 and May 2016 were included in our analysis (N ¼ 9667).Restricting the age range removed outlier examinees who may have misreported their date of birth.To avoid multiple scores for examinees, only the highest score was analyzed for repeat test takers.Information regarding test takers' personal and professional characteristics (e.g., race or ethnicity, gender, undergraduate major) was obtained from their self-reported responses to the demographic questionnaire accompanying their exam.A detailed analysis of examinee demographics is reported in the Appendix (Table III).To ensure an adequate level of privacy for examinees, the state postal codes of each test takers' reported home address were used to group examinees by U.S. census region.As our analysis is performed at the level of the entire Praxis physics subject assessment population, it does not warrant the use of statistical parameters such as p values that are typically applied to analyses of population samples.We, therefore, do not report such error estimates in our findings.

C. Common passing standard
Praxis physics subject assessment performance is reported as a scaled score between 100 and 200, which is a function of both the raw percentage of correct responses and the difficulty of the exam.To assist in the determination of state passing scores, ETS conducts a multistate standardsetting study, where content experts and current or former physics teachers evaluate the probability that a beginning physics teacher would correctly respond an exam item.After several rounds of discussion, these judgments are summed and averaged to yield a final recommended passing score.Individual states are provided with this information, which is considered when setting their own passing standard (i.e., "cut" score), which may change from year to year [9].Examinees that obtain scaled scores at or above a state's cut score are considered to have passed the exam.Following the precedence of Gitomer and colleagues [23], we used publicly available data on individual state passing scores and assumed a common, national passing standard.Using the median cut score across all Praxis physics subject assessment states in the analyzed time frame (see Appendix, Table IV), we assumed a passing standard of 140 for all analyses.To approximate the corresponding percent correct (as an evaluation of the threshold level of content knowledge needed for entry into the profession), we took the median percent correct scores of all individuals who earned an exact scaled score of 140 in the past decade, which resulted in an estimated corresponding percent correct score of 54%.

D. Model selection
Since the scaled scores were approximately normally distributed, a linear regression model was used to identify the demographic and/or personal characteristics of test takers that were most significantly related to their exam performance.A stepwise linear regression approach was used to identify the most appropriate linear model from the set of candidate independent variables (i.e., gender, race or ethnicity, undergraduate major).This stepwise procedure enters or removes one candidate per step based on specified information criteria, such as Akaike's information criterion (AIC) and Schwarz Bayesian criterion (SBC) [24,25].We utilized SBC as the selection criteria, which tends to suggest simpler models with lower dimensionality than AIC.We chose the tenfold cross validation (CV) as the stopping criterion, which reduces bias caused by variable selection.The SBC is defined to be where SSE is the sum of squared errors, n is the sample size, and p is the number of parameters included in the model.The stepwise approach yields a single optimal model; however, there is typically more than one equivalent optimal model with slightly different combinations of variables.It was, therefore, necessary for us to manually include several key variables based on our previous experience and knowledge of these analytic methodologies.This process was employed to generate an aggregate model that identified which personal and professional characteristics were most strongly associated with Praxis physics subject assessment performance.Regression analysis was performed for each of the ten years independently to identify the most significant characteristics or combinations of characteristics for each year.Undergraduate major, undergraduate GPA, and gender were present in the year-level models for eight, eight, and seven of the ten years analyzed, while all other variables were present in the models for no more than two years.Only these most prevalent demographic characteristics were included as variables in the final model and are reported on in Sec.V.The final model explained 24% of the variation in scores (total η 2 , Table I).The total η 2 for each model can be used to determine the partial contribution of each variable or interaction to the overall total variance at each step.Variables that are not included in the final model produced partial η 2 values that did not significantly improve the predictive capability of the model.A detailed analysis of only the most predictive variables or interactions are presented in Sec.V.

E. Differential item functioning (DIF)
To examine whether individual test questions functioned differently for similarly performing examinees with respect to gender and race or ethnicity, we performed differential item functioning (DIF) analysis following closely the procedures previously published by ETS [26,27] and briefly described here.For each test form used between June 2006 and May 2016,1 a new exam score was calculated for examinees after removing items that displayed differences in the likelihood of responding correctly with respect to race or ethnicity or gender.Test takers were then matched into quartiles of similar performers for each test form using this new total exam score.To estimate the relative probabilities of the focal group (i.e., female, Black, Hispanic) responding correctly to a question relative to the reference group (i.e., male, White), a logistic regression model was used to calculate the MH D-DIF statistic [27,28] for each item across forms.MH D-DIF was then used to sort items into one of three categories: A, B, and C, where (i) "A" items had MH D-DIF statistics that were less than 1.0, (ii) "C" items had MH D-DIF values statistically greater than 1.5, and (iii) all remaining items with MH D-DIF statistics between 1.0 and 1.5 fell into category "B." Findings from DIF analyses are reported in Tables V and VI (see Appendix).

A. Median cut scores across states using the Praxis physics subject assessment
Figure 1 shows the state-level, median scaled score required for examinees to be recommended for a physics teaching certification in a particular state between 2006 and 2016.For example, candidates seeking licensure in Pennsylvania during this timeframe would have needed a score of at least 140 (i.e., the state's median cut score in this 10-year period) to be recommended for certification in that state.These cut scores ranged from 126 to 153 (which correspond to approximate percent correct values of 46% and 64%) across all of the states that accepted the Praxis physics subject assessment in the past decade.The national median value of 140 (approximately 54%) was set as our estimated standard across all states for passing the exam when performing the analyses described in the subsequent sections.

B. Demographics
Several relevant personal and professional characteristics of Praxis physics subject assessment test takers in the past decade are summarized in Table II.A total of 9667 individuals took the Praxis physics subject assessment between June 2006 and May 2016, 73.1% of whom were likely to have passed by our previously defined standard (i.e., scaled score greater than or equal to 140).36.9% of examinees were female, who were 20% less likely to pass relative to their male counterparts (i.e., 60.4% compared to 80.4%).An overwhelming majority of the test-taking population reported their race as White (85.6%), while Black and Hispanic individuals represented just 3.6% and 1.7% of examinees, respectively.These underrepresented groups have historically passed the exam at much lower rates, with Black test takers being approximately 1.7 times less successful at passing (43.6%) relative to White test takers (73.7%).Black or White and Hispanic or White DIF analysis revealed an average of 13.2% and 6.3% category C items (i.e., those exhibiting relatively large performance differentials), respectively, per test form (see Appendix, Table V), though this percentage has dropped to 0% for both sets of focal and reference groups in the two most recent administrations of the exam.More than 78.7% of examinees reported undergraduate GPAs above 3.0, and those with higher GPAs have had a higher pass rate.Additionally, physics and engineering majors, who have had relatively high pass rates, have made up just under onethird (32.5%) of the test-taking population in the past decade.

C. Undergraduate and graduate major
A prospective physics teaching candidate's undergraduate major explained 15% of the overall variance in Praxis physics subject assessment performance (Table I).Physics and engineering majors outperformed others by about 10-20 scaled points, with examinees with chemistry, biology, (STEM) education, and both types of other majors performing within 10 points of one another [Fig.2(a)].However, both biology and non-STEM other majors have performed just at the national median cut score, while all other examinees have had average scores above this threshold.Though graduate major was not identified as a significant predictor of performance [possibly due to the lower reporting rate (59%) to this questionnaire item and, therefore, limited data available for analysis], it is included for comparison purposes [Fig.2(b)].Physics and engineering majors were still the highest performers, but the subpopulation of graduate majors in the field did not perform much better than the population of undergraduate majors.Test takers with reported graduate majors in (STEM) education now outperformed examinees with majors in chemistry or other STEM disciplines, and performed roughly 8 scaled points higher (151) than the population of (STEM) education undergraduate majors (143).Examinees reporting graduate majors in chemistry and both "other STEM" and non-STEM disciplines performed several scaled points higher than those with only undergraduate majors in these areas.Finally, graduate majors in biology performed as well as undergraduate majors in this field.

D. Gender
An examinee's gender was also identified by our model as a significant predictor of Praxis physics subject assessment performance [Fig.3(a)].As shown in Fig. 3, males consistently scored approximately 10 points higher than females, an achievement gap that has persisted in each of the past ten years with little year-to-year variation in the mean difference.Examinees of both genders performed at or above the median passing score across states (dashed line) in each of the past ten years, though average scores for females have been within 5 points of the median in each year.A comparison of performance between male and female physics and engineering majors [Fig.3(b)] reveals that while scores for both groups are higher than those for average males and females [Fig.3(a)], the difference in performance between genders remains even after controlling for undergraduate major.Gender DIF analysis revealed an average of 17.6% category C items per test form, or that 17.6% of exam items (on average) exhibited a difference in performance such that the probability of answering correctly was at least 89% higher for males than females.Notably, however, this percentage of category C items has decreased to approximately 7% and 5% on two of the most recent (i.e., 2016) administrations of the exam (see Appendix, Table V).

E. Undergraduate GPA
An examinee's undergraduate GPA was also shown to be significantly correlated with performance on the Praxis physics subject assessment between 2006 and 2016 (Fig. 4).Those with higher GPAs tended to outperform those with lower academic standing, and examinees with GPAs in the highest bracket outperformed others by 5-10 points overall.However, on average, even examinees with GPAs below 2.5 have scored above the median (dashed line) by several points.

F. Race or ethnicity
While an examinee's race or ethnicity was not a significant predictor of Praxis physics subject assessment performance as determined by our model (possibly due to the small number of non-White examinees available for analysis), the documented issue of diversity in the physics teaching community warrants a presentation of these results [Fig.5(a)].White test takers and those of other races or ethnicities have performed similarly during the analyzed timeframe, with a decade-wide average scaled scores of 151 and 152, respectively.The variability in scores for Hispanic examinees is likely a function of the low numbers of reported test takers in each year, but they have performed at the level of White test takers and those of other race or ethnicities in six of the ten years studied.While analyses of the performance of Black examinees are similarly limited by low exam participation, these test takers have consistently scored lower than these groups by as much as 24 scaled points in 2006 to 11 scaled points in 2011.Further, Black test-takers have scored below the national median cut score (dashed line) in all but two of the ten years analyzed, whereas average scaled scores for all other reported races or ethnicities have been above our established standard.Even after controlling for undergraduate majors in physics and engineering [Fig.5(b)], Black examinees averaged 20-25 scaled points lower during this timeframe than all other test takers.Race or ethnicity DIF analysis revealed an average of 16.5% and 10.5% category C items when comparing performance between equally performing (i) White and Black test takers and (ii) White and Hispanic test takers, respectively (see Appendix, Table VI).Specifically, this data indicates that 16.5% of exam items (on average) exhibited a difference in performance such that the probability of answering correctly was at least 89% higher for White examinees than Black examinees, and that the same was true of 10.5% of items with respect to White and Hispanic examinees.

VI. DISCUSSION
Our analysis of Praxis physics subject assessment examinees in the past ten years reveals a number of findings about the demographic makeup of prospective high school physics teachers and their exam performance.First, stateto-state passing scores required for candidates to be recommended for certification have ranged from 126 to 153 across the nation (Fig. 1), with a median of 140 across states corresponding to an estimated 54% correct.While state-to-state variations in cut scores may be partially influenced by external factors, such as teacher shortages [9], it is important to consider the educational impact of certifying candidates who have not demonstrated a firm grasp of the subject.Encouragingly, candidates reporting GPAs between 3.5 and 4.0 comprised almost 80% of examinees over the past decade (Table II) and have outperformed candidates with lower GPAs (Fig. 4).While physics and engineering majors have earned some of the highest exam scores relative to those of other majors (Fig. 2), they have comprised just under one-third of the overall examinee population.Additionally, the participation and performance of women in physics has been a growing concern of the physics education community and is apparent in our sample of prospective high school physics teachers taking the Praxis physics assessment [15,16].The percentage of females who sat for the Praxis physics subject assessment in the past decade has increased only slightly over this time frame (33.7% in 2006 to 40.2% in 2015), and a gender achievement gap has persisted across all exam years analyzed (Fig. 3).While an average of 30% of exam items exhibited differential performance between males and females in the past decade, it is encouraging to note that gender DIF analysis has demonstrated substantially lower percentage of category C items on the past two of the most recent exam forms.The lack of racial diversity that has been evident in the Physics education community is also reflected in our analysis [17,19,29].Black and Hispanic Praxis physics subject assessment test takers represented just 3.6% and 1.7% of test takers (Table II), while they comprise 13% and 18% of the U.S. population, respectively [29].Additionally, using our estimated passing standard, both Black and Hispanic examinees have passed at lower rates than White and Asian candidates, with less than half of all Black test takers passing the Praxis physics subject assessment over the past ten years, and have underperformed relative to all other races or ethnicities even after controlling for undergraduate major.
Our findings should inform understandings and decisions about the quality, recruitment, and preparation of the high school physics teaching workforce.While physics and engineering majors have averaged a scaled score of 158, far exceeding the scores of examinees reporting other majors, this corresponds to an estimated percentage score of 66%.This performance should alert physics faculty to issues with long-term retention of knowledge from introductory physics courses (which these majors are likely to have taken).It is possible that more traditional curricula and approaches to teaching physics may not appropriately facilitate conceptual retention beyond end-of-course exams [30].Instructional shifts toward reform-based practices and curricula may be especially conducive to improving student learning in the discipline [31,32].STEM education majors (who are most likely to pursue teaching careers) were among the lowest performers, averaging a scaled score of 143 (∼56%).Attempts to more closely coordinate the efforts of STEM education departments with disciplinary departments may provide both students and faculty with the resources necessary for mutual success.Learning assistant models, for example, have proven mutually beneficial to both future K-12 educators and participating university faculty in shifting teaching practices [33].
Additionally, even after controlling for physics and engineering majors, a gender achievement gap was apparent on the Praxis physics subject assessment.Furthermore, this achievement gap has remained consistently large during the 2006-2016 timeframe.These data speak to the need to more critically evaluate underlying factors of the U.S. physics educational system that may contribute to these differences, including efforts aimed at changing the culture of physics departments, physics courses, and the discipline of physics in general [34][35][36].Furthermore, a number of studies have suggested that reform-based pedagogies have served to not only improve the learning of physics for students broadly but reduce or eliminate gender achievement gaps in university physics classrooms [37][38][39][40].As many of these suggested strategies for change rely on the localized efforts of individual faculty members, this may be an important population to target as part of continued efforts to eliminate performance disparities between males and females.
Lastly, the lack of diversity among physics teaching candidates, which has been an ongoing problem for the community [20,22,41], is evident in the low participation rates and performance among Black and Hispanic examinees.Black examinees in particular have underperformed relative to others even among physics and engineering majors.As with efforts related to gender inequities, the culture of physics likely plays a large role in the success of underrepresented groups.For example, studies have noted that candidates' race or ethnicity are often not considered as part of the criteria for acceptance into physics doctoral degree programs [42], and this population of students is most likely to transition into future physics faculty members.It is possible that until the face of physics more equitably reflects the diversity of the population at large (which largely depends on current university educators), underrepresentation will continue to afflict the discipline at all levels.

VII. LIMITATIONS
We acknowledge that while the Praxis physics subject assessment series is the most nationally representative source of information about the content knowledge of teaching candidates, data from individual teacher certification exams across several of the most populous states is not encompassed in our analysis.Our established national cut score of 140, while necessary for the interpretation of our data and based on a precedent set by previous work [23], is somewhat arbitrary and we recognize that other approaches to determining passing status may have yielded slightly different results.For example, using the average of the average state cut scores during this timeframe (138) would have produced higher percent passing rates in Table I than current reported.Additionally, the estimated passing standard does not account for whether candidates actually passed the exam in the state in which they sought certification.It is important to note the nuances in examinee motivation for taking the Praxis physics subject assessment as it pertains to our reported results.While some test takers do sit for the exam to become certified in physics as their primary specialty, a number of examinees are likely to have taken the exam to obtain a secondary or tertiary subject certification to improve their competitiveness on the job market.It is, therefore, likely that some percentage of the test takers captured in our data set will not ever go on to teach physics.With respect to the interpretation of race or ethnicity data, it is important to note that the number of nonrespondents (566 examinees) is larger than the number of Black or Hispanic respondents, which may impact the observed results.DIF analysis does not incontrovertibly define test items as biased.Items exhibiting differential performance should be more carefully examined to uncover possible explanations to observed differences.Finally, although certification exams are used as a proxy for teacher effectiveness by measuring content knowledge, we also recognize that several other key variables not accounted for here (e.g., pedagogical content knowledge, school context) may have a significant impact on the success of beginning teachers.

FIG. 1 .
FIG. 1. Median Praxis physics subject assessment cut scores from 2006 to 2016.Individual data reported by each state was used to determine the minimum scaled score needed to be awarded certification in the state.The median of these values for each state between 2006 and 2016 is depicted.States were assigned a score if they accepted Praxis physics subject assessment testing for certification in any year within this timeframe.Those that did not are shaded in gray.A scaled score of 140 corresponds to a raw percentage score of approximately 54%.Source: Derived from data provided by Educational Testing Service.

FIG. 2 .
FIG. 2. Praxis physics subject assessment performance by examinees' reported (a) undergraduate major and (b) graduate major.Scaled score is plotted on the y axis.Dashed line represents the median passing score of 140 across states.Source: Derived from data provided by the Educational Testing Service.

FIG. 3 .
FIG. 3. Praxis physics subject assessment performance by (a) all examinees' reported gender and (b) the indicated gender of examinees reporting physics and engineering undergraduate majors.Scaled score is plotted against academic year (i.e., academic year 2015 ¼ June 2015-May 2016).Dashed line represents the median passing score of 140 across states.Source: Derived from data provided by the Educational Testing Service.

FIG. 5 .
FIG. 5. Praxis physics subject assessment performance by (a) examinees' reported race or ethnicity and (b) the reported race or ethnicity of physics and engineering majors.Scaled score is plotted on the y axis.Dashed line represents the national median passing score of 140.Source: Derived from data provided by the Educational Testing Service.

TABLE I .
Stepwise linear regression models including top examinee characteristics most strongly associated with performance on the Praxis physics subject assessment from 2006 to 2016.

TABLE II .
Personal and professional characteristics of Praxis physics subject assessment examinees from 2006 to 2016.

TABLE III .
Detailed list of personal and professional characteristics of Praxis physics subject assessment Examinees from 2006 to 2016.Source: Derived from examinee questionnaire data filled out during test registration, provided by the Educational Testing Service.This section containing a complete list of examinee characteristics, individual state cut scores from 2006 to 2016, and summary of DIF analysis.

TABLE III .
(Continued)Undergraduate GPA categories of 2.5-2.99,2.0-2.49,1.5-1.99,and below 1.5 were grouped into this collective category.b Examinees were grouped into census regions using a state postal code in the original data set.c This question only became part of the questionnaire beginning in 2008.

TABLE IV .
Minimum passing scaled scores for 37 U.S. states and D.C. that have accepted the Praxis physics subject assessment from 2006 to 2016.Data was retrieved from annually published Praxis II series passing score by test and state documents from ETS [43-53].

TABLE V .
Gender differential item functioning analysis of Praxis physics subject assessments test forms from 2006 to 2016.Item categories (ABC) were determined using form-specific scores for males (reference group) and females (focal group).

TABLE VI .
Race or ethnicity differential item functioning analysis of Praxis physics subject assessments test forms from 2006 to 2016.Item categories (ABC) were determined using form-specific scores for White (reference group) and each listed ethnicity (focal group).Data is listed as "focal group vs reference group."DIF analysis for test forms 5265134 and 5265142 are not included due to an insufficient number of test takers of each race or ethnicity for these most recently administered forms.
a Test required, but cut score not available.bTestnot required.(Tablecontinued)