Correlating Student Interest and High School Preparation with Learning and Performance in an Introductory University Physics Course

We have studied the correlation of student performance in a large 1 st year university physics course with their reasons for taking the course and whether or not the student took a senior-level high-school physics course. Performance was measured both by the Force Concept Inventory and by the grade on the Final Examination. Students who took the course primarily for their own interest outperformed students who took the course primarily because it was required, both on the Force Concept Inventory and on the Final Examination; students who took a senior-level high-school physics course outperformed students who did not, also on both the Force Concept Inventory and on the Final Exam. Students who took the course for their own interest and took high school physics outperformed students who took the course because it was required and did not take high school physics by a wide margin. However, the normalized gain on the Force Concept Inventory was the same within uncertainties for all groups and subgroups of students.

Lecture Demonstrations5 are used extensively in the classes. There are two hours of class every week.
In addition, traditional tutorials and laboratories have been combined into a single active learning environment, which we call Practicals; 6 7 and Laws' Workshop Physics. 8 In the Practicals students work in teams of four on conceptually based activities using a guided discovery model of instruction. Whenever possible the activities use a physical apparatus or a simulation. Some of the materials are based on activities from McDermott and from Laws. There are two hours of Practicals every week.

II. METHODS
The FCI was given during the Practicals, the Pre-Course one during the first week of classes and the Post-Course one during the last week of classes. There is a small issue involving the values to be used in analyzing both the Pre-Course and the Post Course FCI numbers. In our course, 868 students took the Pre-Course FCI, which was over 95% of the students who were currently enrolled; 663 students took the Post-Course FCI, which was over 95% of the students who were still enrolled at that time. Between the PreCourse and Post-Course FCI dates, 223 students had dropped the course; this dropout rate of about 25% is typical for this course. In addition, 22 students added the course later or for another reason did not take the Pre-Course FCI, but did take the Post-Course FCI. With one exception that is noted below, all data and analysis below uses "matched" values, i.e. the 641 students who took both the Pre-Course and the Post-Course FCI. In all cases, the difference between using raw data or matched data is only a few percent. These small differences between matched and unmatched data are consistent with a speculation by Hake for courses with an enrollment > 50 students. 9 Figure 1(a) shows the Pre-Course FCI scores. The distribution is not well modeled by a Gaussian, due to the tendency for scores to flat-line at higher values. Figure 1(b) shows the Post-Course FCI scores, which does not conform to a Gaussian distribution at all. Therefore, in the analysis below we will use the medians and quartiles instead of computed means and standard deviations to characterize the FCI results. Appendix A lists the values of the quartiles for the data shown in Figure 1, plus all other quartile values discussed below. The Final Exam in the course was 2 hours long. It had 14 conventional problems (3 algebraic and 11 numeric), conceptual questions which included only words, figures, and/or graphs, and one question on uncertainty analysis. The Exam had 12 multiplechoice questions worth 5 points each, and 2 long-answer questions which were marked in detail with some part marks available. On the multiple choice section, 8 of the questions were traditional problems; in the long-answer section 12 of the available 20 points were traditional problems. Table I shows the overall relative weighting of these questions. We should emphasize that the "conceptual" questions were more tightly focused than the typical question on the FCI, and in no case were the questions on the Exam based on FCI ones. Also, note that the majority of the Exam was testing conventional problems.   Figure 2 shows the grade distribution. It can be approximately modeled as a Gaussian, so we use the mean and standard deviation to characterize the distribution. Here the value for the mean is 68, and the standard deviation is 18. At the University of Toronto, a grade of 68 is a C-plus.

Figure 2. Final Exam scores for "Matched" Students
We asked the students 6 questions about their reason for taking the course and some background information about themselves. We collected this data during the second week of classes with clickers. Appendix B lists the questions and percentage of student answers. The only factors that gave statistically significant differences in student performance were their reason for taking the course, Question 2, and whether or not they had taken a senior-level high-school physics course, Question 4. For some other questions, such as Question 6 on whether the student has previously started but dropped the course, the lack of a correlation may due to the fact that the percentage of student who had previously dropped the course was so small that the uncertainties in the results were overwhelming.
Students receive a small number of points towards their grade for answering clicker questions in class. However, only about 75% of the matched students answered these questions. These comparatively low numbers surprised us. Perhaps some students had not yet gotten their clickers, or hadn't remembered to bring them to class, or didn't bother to answer these questions. We note that this unfortunate loss of nearly 25% of our sample size could have been avoided if we had included these questions on the Pre-Course FCI. Nonetheless, we believe that using the data for students who did answer these questions gives us a reasonable profile of the class.

III. STUDENT REASONS FOR TAKING PHY131
As shown in Appendix B, the question that we asked the students about their reasons in taking our course and the percentage of the students in each category, in parentheses, was: What is the main reason you are taking PHY131?
A. It is required (32%) B. For my own interest (16%) C. Both because it is required and because of my own interest (52%) Figure 3 shows boxplots of the Pre-Course FCI scores for each category of student interest. The "waist" on the box plot is the median, the "shoulder" is the upper quartile, and the "hip" is the lower quartile. The vertical lines extend to the largest/ smallest value less/greater than a heuristically defined outlier cutoff. 11 Also shown in the figure are the statistical uncertainties in the value of the medians. 12 Figure 3. Boxplots of the Pre-Course FCI scores for different reasons for taking PHY131 As seen in Figure 4, the same correlation with student interest was seen in student performance on the Post-Course FCI, although the overall median score was higher for the Post-Course test (77%) than the Pre-Course one (53%). The dot represents a data point that is considered to be an "outlier."  The different student reasons for taking PHY131 were also reflected in the Final Examination grades in the course, as shown in Table II. The errors are the standard error of the mean , where is the standard deviation and N is the number of students.
Also shown in parentheses are the corresponding letter grades of the means according to University of Toronto standards.
\(\sigma_m \equiv \sigma/\sqrt{N}\) σ m ≡ σ / N , where σ is the standard deviation and N is the number of students. Also shown in parentheses are the corresponding letter grades of the means according to University of Toronto standards. Appendix C discusses the p-values for these distributions plus the 2 groups of the next section.

IV. SENIOR-LEVEL HIGH SCHOOL PHYSICS
In Ontario the senior-level high school physics course is commonly called "Grade 12 Physics." Grade 12 Physics or an equivalent course is recommended but not required for PHY131. As shown in Appendix B, 75% of our students took Grade 12 Physics, and 25% did not.
There have been surprisingly few studies of high school physics and later perforfmance in university physics. Champagne and Klopfer studied 110 University of Pittsburgh students, and looked at many factors that might influence physics performance.
They found that there was a positive correlation between taking high school physics and performance on university physics course tests and exams, although their methodology, perhaps wisely, did not attempt to quantify the size of the effect. 13 In 1993 Hart and Cottle reported that taking high school physics correlated with a mean 6.02 ±1.09 increase in the final grade in university-level introductory physics for 508 students at Florida State University, 14 and in 2001 Sadler and Tai reported a 3.49 ± 0.57 increase in a study of 1,933 students at a variety of U.S. universities. 15 The differences between the values reported by Hart and Cottle vs. Sadler and Tai are not well understood. However, Hazari, Tai and Sadler in a massive study reported in 2007 showed that there are correlations between university physics course grades and the details of the curriculum of the high school physics course that the students took. 16 This result indicates that there is perhaps at least a small causal relationship between taking high school physics and university physics performance. Figure 5 shows the Pre-Course FCI scores for our students who did and did not take Grade 12 Physics. The boxplots for the Post-Course FCI scores looked similar except for an overall upward shift in the median values, so are not shown.

V. COMBINING INTEREST AND BACKGROUND
When we compare students who are primarily taking PHY131 for their own interest and who took Grade 12 Physics (61 students) with students who are primarily taking PHY131 because it is required and did not take Grade 12 Physics (48 students), the differences are quite dramatic, as shown in Figures 6 and 7, and Table IV. Note in Figure 6 that the interquartile ranges do not even overlap.   Taking PHY131 because it is required and did not take grade 12 physics 58.6 ± 2.6(D+)

VI. GAINS ON THE FORCE CONCEPT INVENTORY
The standard way of measuring student gains on the FCI is from a seminal paper by Hake. 17 It is defined as the gain divided by the maximum possible gain, often called the normalized gain G: Clearly, G cannot be calculated for students whose PreCourse% score was 100. For our course, 9 students got perfect scores on the Pre-Course FCI and no value of G was calculated. In addition to these 9 students, there were 10 students whose PreCourse% was over 80%, and whose G was less than -0.66. Somewhat arbitrarily, we classified these 10 students as outliers and ignore their G values below: perhaps they were survey-fatigued and didn't try to do their best on the Post-Course FCI.
One hopes that the students' performance on the FCI is higher at the end of a course than at the beginning. The standard way of measuring the gain in FCI scores for a class is called the average normalized gain, to which we will give the symbol mean, and was also defined by Hake in Reference 17: where the angle brackets indicate means. However, since the histograms of FCI scores such as Figure 1 are not well approximated by Gaussian distributions, we believe that the median is a more appropriate way of characterizing the results.
We will report < g > mean since it is standard in the literature, but will also report the normalized gain using the medians, < g > median , which is also defined by Eqn. 2 except that the angle brackets on the right hand side indicate the medians.
Recall that our study uses only "matched" FCI scores; the 10 student outliers are also excluded from our calculations of < g >.
The overall normalized gain for PHY131 was (< g > mean ,< g > median ) = (0.45 ± 0.02, 0.50 ± 0.03). The stated uncertainties are the propagated standard error of the means for the average normalized gain, and the inter quartiles ranges divided by \(\sqrt{N}\) for the median normalized gain. The value of the average normalized gain is consistent with other courses that, like ours, make extensive use of research-based "reformed" pedagogy.
The normalized gains for all the categories and sub-categories of students discussed above were consistent with being the same as the overall value for the course. Table V

Student Category (< g > mean ,< g > median )
Taking the course because it is required and did not take Grade 12 Physics (0.46 ± 0.05, 0.45 ± 0.06) To the extent that that the normalized gain <g> measures the effectiveness of instruction, then, the data indicate that the pedagogy of PHY131 is equally effective for all groups and sub-groups of students. As the saying goes: "A rising tide lifts all boats."

VI. DISCUSSION
Our goal was to determine if a student's interest in physics and/or involvement in a senior-level high school physics course had any effect on student success in a large Canadian university physics course. To our knowledge that is the first time this has been attempted in such an institution. Although our results may be applicable to other institutions in other countries, we are not aware of any data to support this except for the correlation with whether the student took high-school physics in References 13 -16.
We found evidence that taking physics for their own interest and having taken a seniorlevel high school physics course were both indicators for success on the Final Exam. Although the Pre-Course and Post-Course FCI scores were different for these groups and sub-groups of students, neither interest nor background correlated within experimental uncertainties with the normalized gains on the FCI.
However, as shown in Table V, the highest performing group of students, those who took the course for their own interest and took Grade 12 Physics, also had the highest median normalized gain of 0.63± 0.10 , while the lowest performing group, those who took the course because it was required and did not take Grade 12 Physics, had the lowest median normalized gain of 0.45 ± 0.06 . The difference between these two values is 0.18 ± 0.12 which is perhaps suggestive of a non-zero value but the difference from zero is not statistically significant.
There are, of course, other variables that correlate with physics performance for which we have not collected data; these include gender, socio-economic background, and more. Hazari, Tai, and Sadler discuss many of these factors in Reference 16.
However, there is one factor that we have not studied which has been shown to have a measurable impact on FCI performance: the ability of students to think in a scientific way. Lawson has developed a Classroom Test of Scientific Reasoning (CTSR) 18 that is based on Piagetian taxonomy. 19 Coletta and Phillips studied the correlation of CTSR performance with the average normalized gain G (not <g> ) and found a positive correlation for students at Loyola Marymount University, but in an indirect argument propose that there is no such correlation for students at Harvard. 20 Coletta, Phillips, and Steinert added data on a positive correlation for students at Edward Little High School, 21 Diff and Tache found a positive correlation for students at Santa Fe Community College, 22 and Nieminen, Savinainen, and Viiri found a positive correlation for high school students in Finland. 23 Since the groups and sub-groups of students we studied have essentially the same median normalized gains <g> median , these CTSR-G studies lead to some very interesting questions. One is: do the various groups and sub-groups of students that we have studied have similar ability to reason in a scientific, formal operational way? Another related question is: are our students more like Harvard students than they are like students at, say, Loyola? Lacking data, we cannot answer either of these questions.
We should caution that when looking at the correlation between student performance and whether or not they took Grade 12 Physics, one should beware of assigning a cause and effect relationship to the data. For example, a student who knows (or perhaps just believes) that he or she is naturally weak in physics will tend to avoid taking Grade 12 Physics in order to keep a higher average grade. So is the student's ability to do well in physics determined by whether he or she took Grade 12 Physics, or perhaps vice versa? Furthermore, the two questions about student interest and high school background are not independent.
The students who avoid high school physics will also tend to be the students who are taking PHY131 mainly because it is required, and a higher percentage of students who voluntarily take high school physics will also tend to be taking PHY131 mainly for their own interest.
Considering the correlations of student background and interest with performance, either measured with the FCI or the course Final Examination, it is tempting to think of separating these widely divergent student populations. In 2002 Henderson looked at the idea of using FCI Pre-Course results for this purpose, and his data show that this is not appropriate: the FCI score does not do a good job of predicting success or failure in the class. 24 The ultimate failures in our course are the 25% of the students who dropped it, although the "failure" may be ours, not the students. These are not "matched" students since they did not take the Post-Course FCI. The quartiles of their performance on the Pre-Course FCI were (27,40.0 ± 2.0, 57) which are not radically lower than the matched students' quartiles of (37, 53.3±1.3, 70). These dropouts had a similar profile of their reasons for taking the course, but 45% of them did not take a senior-level high school course compared to 25% of the matched students.
For the students who completed the course, 13 did not take a senior-level high school course, were taking our course mainly because it was required, and scored less than 25% on the Pre-Course FCI. Over half of these students, 7 out of 13, ended up passing the Final Examination and 2 of them received letter grades of B; these two students received final course grades of B+ and A-; these two students also achieved normalized gains G on the FCI of 0.50 and 0.65 respectively. There was also one student in this group who got a C+ on the Final Exam, a final course grade of B-, and scored an amazing normalized gain G of 0.85 on the FCI, improving his/her FCI score from 13.3% to 86.7%. We certainly do not want to have excluded these good students from our course.
Our data are based on students self-reporting with clickers on their main reason for taking the course, and whether or not they took a senior-level high school physics course. All surveys have a problem with the fact that the people being surveyed have a tendency to answer what they believe the surveyor wishes to hear, and our clicker-based one probably has the same problem.
We are unaware of any reason why a clicker-based survey may be more or be less biased than a paper-based one, a web-based

VII. FUTURE WORK
Coletta and Philips in Reference 20 showed that there is correlation between Pre-Course FCI scores and the normalised gain G for students at 3 of the 4 schools studied, Loyola Marymount University, Southeastern Louisiana University, and the University of Minnesota, but found no correlation for students at Harvard. They believe that there is a "hidden variable" effecting these correlations: the ability of students to reason scientifically Our data, which are not shown, also shows a positive correlation: fitting G vs. the Pre-Course FCI scores gave a slope of 0.00212 ± 0.00054 although, as discussed, we have not measured the hidden variable with the CTSR.
Administering the FCI under controlled conditions takes a total of one hour of precious time from our Practicals, which is about 5% of the total. We are also using the Colorado Learning Attitudes about Science Survey (CLASS), 25 but since we are reluctant to give up more class or Practical time have made it an on-line survey.
Administering the CTSR under controlled conditions would take even more class or Practical time. In addition, we are concerned about inducing "survey fatigue" in our students by giving them too many diagnostic instruments. However, we are considering using the CTSR, perhaps in place of the FCI, and looking at the reasoning ability of the various groups and subgroups of students that we have discussed in this paper.

APPENDIX B
We asked the students to self-report on the reason they are taking the course and some background information about themselves. Here we summarise that data.

Percent
"Last year" 10% "Two or more years ago" 9% 6. "Have your previously started but did not finish PHY131?"

APPENDIX C
Student's T-Test is well known for testing whether or not two distributions are the same. 26 It typically returns the probability that the two distributions are statistically the same, the p-value, which is sometimes referred to just as p. By convention, if the p-value is < 0.05 then the two distributions are considered to be different.
However, the test assumes that the two distributions are both Gaussian, which is not the case for FCI scores. Two alternatives for non-Gaussian distributions are the Mann Whitney U-Test 27 and the Kruskal-Wallis one-way analysis of variance. 28 Both of these are based on the median, not the mean. Kruskal-Wallis is an extension of Mann Whitney, can deal with more than two samples, but assumes that the distributions have the same shape and differ only in the value of the medians. Both typically return p-values, which are interpreted identically to the p-value of Student's T-Test.
We are not aware of better alternatives to these ways of calculating p-values for our data, although none are perfect. In practice, for our data all three methods gave similar pvalues in comparing the various groups and sub-groups of students, although our software, Mathematica, sometimes complained about the fact that the data do not really match the assumptions of the particular algorithm being used. Table VII summarizes some of the results. Note that for comparing three or more categories of students, we show the results for the only test that accepts such data, Kruskal-Wallis, although the assumption of distributions with the same shape is not really correct, except for the G values of the last row.