Validation and Administration of a Conceptual Survey on the Formalism and Postulates of Quantum Mechanics

We developed and validated a conceptual survey that focuses on the formalism and postulates of quantum mechanics covered in upper-level undergraduate quantum mechanics courses. The concepts included in the Quantum Mechanics Formalism and Postulate Survey (QMFPS) focus on Dirac notation, the Hilbert space, state vectors, physical observables and their corresponding Hermitian operators, compatible and incompatible observables, quantum measurement, time-dependence of quantum states and expectation values, and spin angular momenta. Here we describe the validation and administration of the survey, which has been administered to over 400 upper-level undergraduate and graduate students from six institutions. The QMFPS is valid and reliable for use as a low-stakes test to measure the effectiveness of instruction in an undergraduate quantum mechanics course that covers relevant content. The survey can also be used by instructors to identify student understanding of the formalism and postulates of quantum mechanics at the beginning and end of a graduate quantum mechanics course since graduate students are expected to have taken an undergraduate quantum mechanics course that covers the content included in the survey. We found that undergraduate students who engaged with research-validated learning tools performed better than students who did not on the QMFPS after the first semester of a junior/senior level quantum mechanics course. In addition, the performance of graduate students on QMFPS after instruction in the first semester of a core graduate-level quantum mechanics course was significantly better than the performance of undergraduate students at the end of the first semester of an undergraduate quantum mechanics course. A comparison with the base line data on the validated QMFPS presented here can aid instructors in assessing the effectiveness of their instructional approaches.


INTRODUCTION
Learning quantum mechanics (QM) is challenging partly because it is abstract as well as nonintuitive and students often transfer ideas from classical mechanics to quantum mechanics inappropriately [1][2][3][4]. Several studies have focused on student difficulties with concepts [5][6][7][8][9][10][11][12][13][14][15] and formalism [16][17][18][19][20][21][22][23][24] in QM. We must help students develop a coherent knowledge structure of the foundational concepts related to the formalism and postulates of quantum mechanics before they can solve novel, complex problems. Furthermore, developing a robust understanding of quantum mechanics requires a solid grasp of linear algebra, differential equations, and special functions [25]. Regardless of the mathematical complexity of quantum mechanics problems, students must develop a functional understanding of quantum mechanics. This entails developing a good knowledge structure of the underlying concepts and be able to reason systematically about relevant quantum mechanics concepts involved while also developing quantitative skills to solve the problems instead of using a plug-and-chug approach.
Research-based conceptual surveys (both free response and multiple-choice format) are useful tools for evaluating student understanding of various topics without focusing heavily on their mathematical skills. Furthermore, carefully developed and validated surveys can play an important role in measuring the effectiveness of a curriculum and instruction. If well-designed multiple-choice pretests and posttests are administered before and after instruction in relevant concepts, they can provide one objective means to measure the effectiveness of a curriculum and instructional approach in a particular course. When compared to free response, multiple choice is free of grader bias and such tests can be graded with great efficiency. Furthermore, the results are objective and amenable to statistical analysis so that different instructional methods or different student populations can be compared. Also, good instructional design requires taking into account the prior knowledge of the students. An effective way to assess the prior knowledge of students, i.e., what the students know before instruction in a particular course, is to administer conceptual surveys as pre-tests. When pre-tests are compared with post-tests, the comparison can give us one objective measure of the effectiveness of instruction.
Several multiple-choice conceptual surveys have been developed for use in physics and astronomy courses [26]. For example, in introductory physics, researchers have developed many multiple-choice surveys to determine the knowledge states of students at the beginning and end of instruction in a particular course and/or topic, e.g., the Force Concept Inventory, Conceptual Survey of Electricity and Magnetism, Rotational and Rolling Motion Survey, Energy and Momentum Survey, etc. [27][28][29][30]. In addition, conceptual surveys have been developed for use in QM (quantum mechanics) courses [31][32][33]. For example, the quantum mechanics conceptual survey (QMCS) [31] was developed for sophomore-level modern physics courses. It focuses on wave functions and probability, wave particle duality, the Schrodinger equation, quantization of states, uncertainty principle, superposition, operators and observables, and tunneling. It contains 12 questions. The quantum mechanics concept assessment (QMCA) [32] is a 31-item survey that focuses on the time-independent Schrodinger equation, time evolution, wave functions and boundary conditions, and probability and it can be used in an upper-division junior/senior level QM course. The quantum mechanics visualization instrument (QMVI) was developed to evaluate students' conceptual understanding of core topics in quantum mechanics in the undergraduate curriculum, especially their visualization skills [33]. Furthermore, the quantum mechanics assessment tool (QMAT) gauges student learning in a first semester junior-level quantum mechanics course and focuses on wave functions, measurement, time dependence, probability, infinite square well, one-dimensional tunneling, and energy levels [34]. The introductory quantum physics conceptual survey (IQPCS) focuses on basic quantum concepts related to quantization and uncertainty [35]. The quantum mechanics survey (QMS) [36] covers topics in non-relativistic quantum mechanics in one spatial dimension typically covered in the first semester of an upper-level undergraduate course and involves concepts such as possible wave functions, bound/scattering states, measurement, expectation values, time dependence of wave function and expectation values, stationary and non-stationary states, role of the Hamiltonian, uncertainty principle, and Ehrenfest's theorem. It can be used in most junior/senior-level quantum mechanics courses if relevant concepts are covered.
However, previously developed conceptual surveys for use in QM courses do not focus explicitly on the postulates or formalism of quantum mechanics. For example, Dirac notation, the Hilbert space, state vectors, physical observables and their corresponding Hermitian operators, compatible and incompatible observables, projection operators and writing operators in terms of their eigenstates and eigenvalues are not covered in other QM conceptual surveys. In addition, other previously developed QM surveys do not include concepts related to spin angular momenta. Therefore, we developed and validated a QM conceptual survey that focuses on the formalism and postulates of quantum mechanics that includes these concepts. Here, we discuss the development and validation of the Quantum Mechanics Formalism and Postulates Survey (QMFPS), which is a 34-item multiple-choice test appropriate for use in an upper-level undergraduate quantum mechanics course as a post-test (after instruction in relevant concepts) or graduate level quantum mechanics course as a pre-test (at the beginning of the course) or post-test [19]. The survey can be used to identify upper-level undergraduate students' final and graduate students' initial and final knowledge states related to the formalism and postulates of quantum mechanics at the beginning and end of a course to assess the effectiveness of a quantum mechanics curriculum in which relevant concepts are covered. The results of the survey can also be used to guide the development of instructional strategies to help students learn these concepts better.

II. QMFPS SURVEY DEVELOPMENT AND ADMINISTRATION
According to the standards for multiple-choice test design, a high-quality test has five characteristics: reliability, validity, discrimination, good comparative data, and is tailored to the population [37][38][39]. Furthermore, the development of a well-designed multiple-choice test is an iterative process that involves recognizing the need for the test, formulating the test objectives, constructing test items, performing content validity and reliability check, and distribution [37][38][39]. Below, we describe the development of the QMFPS and how we ensured that the test was developed based upon the standards of multiple-choice test design.
Development of the survey: We recognized the need for a conceptual survey focused on the formalism and postulates of quantum mechanics in that previously developed QM conceptual surveys do not focus explicitly on the postulates or formalism of quantum mechanics. In particular, there are no QM surveys that focus explicitly on Dirac notation, the Hilbert space, state vectors, physical observables and their corresponding Hermitian operators, compatible and incompatible observables, projection operators and writing operators in terms of their eigenstates and eigenvalues are not covered in other QM conceptual surveys. Furthermore, other QM surveys do not include concepts related to spin angular momenta. Therefore, we developed the QMFPS, which focuses on assessing students' conceptual understanding of the formalism and postulates of QM, including Dirac notation, the Hilbert space, state vectors, physical observables and their corresponding Hermitian operators, compatible and incompatible observables, quantum measurement, time-dependence of quantum states and expectation values, and spin angular momenta. The final version of the survey is included on PhysPort [41]. Table I shows one possible categorization of the questions on the survey based upon the concepts, although the categorization may be done in many other ways.
While designing the survey, we focused on making sure that it is valid and reliable [37][38][39]. Validity refers to the extent to which the test consistently measures whatever it is supposed to measure, and reliability refers to the extent to which the test measures what it measures consistently [37][38][39]. To ensure that the survey is valid for low-stakes group assessment of QM curriculum and instructional approaches that focus on relevant topics, we consulted with 6 faculty members regarding the goals of their QM courses and topics their students should have learned related to the formalism and postulates of quantum mechanics in upper-level undergraduate QM. In addition to carefully looking through the coverage of these topics in several upper-level undergraduate quantum mechanics classes, we also browsed over several homework, quiz and exam problems that faculty in these courses at the University of Pittsburgh (Pitt) had given to their students in the past when we started the development of the survey. We also gave open-ended questions on relevant topics to students in upper-level QM and interviewed some students one-on-one to get an in-depth understanding of their reasoning behind their responses. These interactions with faculty members and students helped us formulate the test objectives and construct the preliminary test items in initial versions of the survey. We note that the faculty members were not only consulted initially before the development of the survey questions, but we also iterated different versions of the survey with several instructors at Pitt at various stages of the development to ensure that the test content was valid, i.e., that the test items matched the objectives of the test and the test items were accurate, formatted correctly, and were grammatically correct. The faculty members reviewed different versions of the survey several times to examine its appropriateness and relevance for the upper-level quantum mechanics courses and to detect any possible ambiguity in item wording. These valuable comments and feedback from faculty members also helped to ensure that the test was designed with the target population (upper-level undergraduate and graduate students) in mind, i.e., the difficulty level of the questions were appropriate for this target population. In addition to the analysis of the responses to the open-ended questions and interviews, the alternative choices for the multiple-choice questions were informed by prior research on common student difficulties in QM on these topics [1-4, 14-16, 19-24]. The individual interviews were conducted with 23 students using a think-aloud protocol [40] at various phases of the test development to better understand students' reasoning processes while they answered the open-ended and multiple-choice questions. Within this interview protocol, students were asked to talk aloud while they answered the questions so that the interviewer could understand their thought processes. The interviews were invaluable and often revealed unnoticed difficulties (not necessarily clear from written responses), and these were incorporated into new versions of the survey. This allowed us to refine the survey further to ensure that the questions were relevant and clearly worded. The interviews also allowed us to further confirm that the difficulty level of the test was appropriate for upper-level undergraduate and graduate students (i.e., the test was designed for the target population in mind).
The final version of the survey can be accessed via the link in ref. [41]. Each question has one correct choice and four incorrect choices. We find that almost all of the students are able to complete the QMFPS in one 50 minute class period after instruction in relevant concepts. Students can answer the QMFPS questions without performing complex calculations, although they do need to understand the basics of linear algebra since that is central to the formalism and postulates of QM. The survey can be used in a junior/senior level undergraduate quantum mechanics course (e.g., at the level of the first four chapters in D. Griffiths' QM textbook [42]) as a post-test, as long as students have learned Dirac notation. It can also be administered in a graduate-level QM course as both a pre-test or post-test to determine students' initial and final knowledge states in regards to the formalism and postulates of QM. While the QMFPS should not be administered as a high-stakes test and the data should be interpreted for the class as a whole to gauge the effectiveness of instruction, it is suggested that students receive some credit for completing the survey in order for students to take it seriously. For example, if the survey is given as a posttest, it can count as a graded low-stakes quiz. If the survey is given as a pretest in a graduate course, it can count as a quiz for which students should be given full credit for trying their best.
Administration of the validated survey: The reliability check is performed during a large-scale administration of the final form of the test [37][38][39]. The validated QMFPS was administered to 464 students from 6 institutions over a period of four years.* Of the 464 students, 350 were undergraduate students and 114 were graduate students. The undergraduate students were enrolled in the first semester of a QM course at the junior/senior level. The graduate students were enrolled in a graduate-level core QM course. The undergraduate students completed the survey as a posttest at the end of their first semester in QM, and the graduate level students completed the survey after at least two months into the first semester of graduate level quantum mechanics. Both the undergraduate and graduate students worked through the survey during a 50-minute class period. Some of the undergraduate students were enrolled in QM courses that used research-based learning tools such as concept tests and quantum interactive learning tutorials ( = 43). The survey was given to a subset of these students twice, once at the end of the first semester and then again at the beginning of the second semester after the winter break ( = 15). This large-scale administration allowed us to collect comparative data by administrating the test to various groups of students for whom it was designed.
General Test Statistics: The average score on the survey after instruction is 41% (including only the first score of the students who took the survey twice). The standard deviation is 20%, with the highest score being 100%. The average score of undergraduate students is 37% and the average score of graduate students is 52%. The fact that the graduate students' performance is better than undergraduate students' performance provides another measure of content validity since graduate students are supposed to know these concepts better overall. There is a significant difference between the graduate and undergraduate students' average scores (p-value of t-test<0.001). Figure 1 shows a histogram of the students' scores on the QMFPS.
The average posttest score for the upper-level students who used concept tests and group discussion and Quantum Interactive Learning Tutorials (QuILTs) was 58% (S.D.=20%). The average posttest score for other undergraduate students who did not use research-based learning tools was 32% (S.D.=16%). There is a significant difference between the scores of students who used research-based learning tools and those who did not (p-value of t-test<0.0001).

Reliability analysis:
We performed various statistical analyses to determine if the QMFPS is reliable. If a test is administered twice at different times to the same sample of students, then one would expect a highly significant correlation between the two test scores (test-retest reliability), assuming that the students' performance is stable and that the test environmental conditions are the same on each occasion [37][38][39]. Since testing students twice in a very short interval is not practical, one way to determine overall test reliability is via the Kuder-Richardson reliability index (KR-20), which is a measure of the selfconsistency of the entire test. According to the standards of test design [37][38][39], the KR-20 should be higher than 0.7 to ensure that the test is reliable. The KR-20 for the QMFPS is 0.87, indicating that the survey is very reliable.
Performing item analysis can provide further insights into the survey's reliability. Table II shows that, on the QMFPS, the item difficulty (percentage correct) for undergraduate students ranges from 0.13 to 0.70 and the item difficulty for graduate students ranges from 0.16 to 0.90. Table III shows the distribution of student responses for each survey item. We calculated item discrimination for each item on the survey to ensure that the test is reliable. One way to measure item discrimination is by calculating the pointbiserial coefficient. It is a measure of consistency of a single test item with the whole test-it reflects the correlation between a student's score on an individual item and his/her score on the entire test [37][38][39]. The point-biserial coefficient has a possible range of −1 to +1. If an item has a high point-biserial coefficient, then students with high total scores are more likely to answer the item correctly than students with low total scores. A negative point-biserial value indicates that students with low overall scores were more likely to get a particular item correct than those with a high overall score and is an indication that the particular test item is probably defective. Ideally, point-biserial coefficients should be above 0.2 [37][38][39]. Table II shows the point-biserial coefficients for each item on the QMFPS. The average point-biserial is 0.41 and ranges from 0.20 -0.62. The standards of test design [34,35] indicate that the survey questions have reasonably good item discrimination. The question with the lowest point-biserial coefficient of 0.20 (Q 32) was also the most difficult question on the test for all students (item difficulty is 0.16).
Another aspect of survey reliability is construct-related validity, which is associated with understanding the nature of the characteristics being measured and the consequences of the uses and interpretations of the results [37][38][39]. A construct is an individual characteristic that is used to explain the performance on an assessment. For example, mathematical reasoning is a construct that can be used to explain students' performance on a mathematics assessment [37][38][39]. In our survey, understanding of the formalism and postulates of QM is a construct that can be used to explain performance on the QMFPS. One way to collect evidence of construct validity involves related measures studies. Related measures studies investigate correlations between different assessment measures. For example, one would expect a positive correlation between the Force Concept Inventory and the Force and Motion Conceptual Evaluation since they were designed to measure similar constructs (i.e., students' understanding of force and motion) [37][38][39]. Therefore, we examined whether students' QMFPS scores were correlated with other validated QM surveys and their performance in quantum mechanics courses to ensure construct validity of the QMFPS. Eighty-two undergraduate students enrolled in the first semester of an undergraduate upper-division QM course were given both the QMFPS and the Quantum Mechanics Survey (QMS), which is a previously validated survey that focuses on students' understanding of non-relativistic QM in one-dimension, after traditional instruction in relevant concepts. Figure 2 shows that there is a strong correlation between students' scores on the QMS and the QMFPS. This correlation provides construct-validity to the QMFPS survey because students who performed well on the QMS are generally likely to have a better foundation in formalism and postulates of quantum mechanics and perform better on the QMFPS. The QMFPS tends to be more difficult for students than the QMS, possibly because it covers more advanced topics as opposed to the QMS which covers QM in one spatial dimension.  In addition, 44 graduate students enrolled in the first semester of a graduate-level core quantum mechanics course were given the QMFPS at the end of the semester. Figure 3 shows that there is a moderate correlation between students' scores on the QMFPS and their final exam in the graduate-level QM course. This correlation provides further evidence of construct validity of the QMFPS since the concepts covered in the final exam were similar to those covered in the QMFPS.

III. SUMMARY
Learning QM is challenging for students partly because of the "paradigm shift" from classical mechanics to quantum mechanics as well as the mathematical expertise required to solve problems. Students in traditionally taught and evaluated QM courses may be able to "hide" their lack of conceptual understanding of the formalism and postulates of QM behind their mathematical skills [3]. However, in order for students to develop functional understanding, it is important to close the gap [43] between conceptual and quantitative problem-solving by assessing both types of learning. We have developed a conceptual survey that assesses students' conceptual knowledge of the formalism and postulates of QM, which are topics that instructors of QM courses agree are important to cover [25]. The development of the test followed the standards of multiple-choice test design [37][38][39], and we ensured that the test was valid and reliable, had good discrimination, was tailored to the population, and that we collected good comparative data [37][38][39].
Details of student difficulties found via QMFPS is beyond the scope of this paper and will be discussed elsewhere. Student responses to questions on the QMFPS can be used as a formative assessment [44,45] to help instructors identify common student difficulties and guide the design of instructional strategies and learning tools to improve students' understanding. This survey can also be administered to students after instruction in the relevant concepts to evaluate the effectiveness of instruction on relevant topics in a particular course.
Furthermore, we found that students who were enrolled in QM courses that used active learning methods, such as peer instruction and tutorials, performed better on the QMFPS than those who did not. These approaches included active learning techniques  such as peer instruction [52,53], tutorials [54][55][56][57][58][59][60][61][62][63][64][65][66][67][68], cooperative group problem solving [69], and Just-In-Time-Teaching [70,71], to help students develop a coherent knowledge structure of the formalism and postulates of QM. In addition, we found that, although graduate students performed significantly better than undergraduate students, their average overall score was not very high. This may partly be due to the fact that graduate students who were taught primarily via traditional approaches may have developed algorithmic skills to solve problems on their exams, which often reward "plug and chug" approaches, but lack a conceptual understanding of quantum mechanics. However, even graduate students may not be motivated to develop a coherent knowledge structure of QM if course assessments only focus on quantitative reasoning. Therefore, in order to help students develop a functional knowledge of quantum mechanics, we suggest that the learning goals for upper-level QM include proficiency in concepts covered in the QMFPS and emphasize the connection between conceptual understanding and mathematical formalism. Furthermore, instructors of graduatelevel QM courses can reflect on their students' responses on the QMFPS to design instruction that helps to "close the gap" between students' conceptual understanding and quantitative problem solving.

Quantum Mechanics Formalism and Postulates Survey
Definitions, notation, and instructions: * For a spinless particle confined in one spatial dimension, the expectation value of a time-independent physical observable in a state |Ψ( )〉 at time in position space is 1. Choose all of the following statements that are correct for a particle interacting with a one dimensional (1D) infinite square well.
(1) The appropriate Hilbert space for this system is one dimensional. (2) The energy eigenstates of the system form a basis in a 1D Hilbert space. (1) The measurement of the energy will yield either 1 or 2 .
(3) The measurement of the energy will yield   6. |Ψ(0)⟩ is the initial state of the system at time = 0 and ̂ is the Hamiltonian operator. Choose all of the following statements that are necessarily correct for all times > 0.
(2) −̂ℏ ⁄ is a unitary operator. (1) Right after the position measurement, the wavefunction will be peaked about a particular value of position.
(2) The wavefunction will not go back to the first excited state wavefunction, even if you wait for a long time after the position measurement. 9. An operator ̂ corresponding to a physical observable has a continuous non-degenerate spectrum of eigenvalues. The states {| ⟩} are eigenstates of ̂ with eigenvalues . At time = 0, the state of the system is |Ψ⟩. Choose all of the following statements that are correct.
(1) A measurement of the observable must return one of the eigenvalues of the operator ̂.
Choose all of the following statements that are correct. Ignore normalization issues. 15. Choose all of the following statements that are correct: (1) The stationary states refer to the eigenstates of any operator corresponding to any physical observable.
(2) In an isolated system, if a particle is in a position eigenstate (has a definite value of position) at time = 0, the position of the particle is well-defined at all times > 0.
(3) In an isolated system, if a system is in an energy eigenstate (it has a definite energy) at time = 0, the energy of the particle is well-defined at all times > 0. (1) An observable whose corresponding time-independent operator commutes with the timeindependent Hamiltonian of the system, ̂, corresponds to a conserved quantity (constant of motion).
(2) If an observable does not depend explicitly on time, is a conserved quantity. (1) You can always find a complete set of simultaneous eigenstates for compatible operators.
(2) You can never find a complete set of simultaneous eigenstates for incompatible operators. (1) The state of each silver atom in beam A will become a superposition of two spatially separated components after passing through a Stern Gerlach apparatus with a magnetic field gradient in the − -direction (SGZ-).
(2) We can distinguish between Beam A and Beam B by analyzing the pattern on a distant screen after each beam is sent through a Stern Gerlach apparatus with a magnetic field gradient in the −direction (SGZ-).
(3) We can distinguish between Beam A and Beam B by analyzing the pattern on a distant screen after each beam is sent through a Stern Gerlach apparatus with a magnetic field gradient in the −direction (SGX-).
A. 1 only B. 2 only C. 1 and 2 only D. 1 and 3 only E. All of the above.
In questions 26-30, the Hamiltonian of a charged particle with spin-1/2 at rest in an external uniform magnetic field is ̂= −̂ where the uniform field is along the z-direction and is the gyromagnetic ratio (a constant). The phrase "immediate succession" implies that the time evolution can be ignored between the first and second measurements. 26. Suppose that at time = 0, the particle is in an initial normalized spin state | ⟩ = | . Choose all of the following statements that are correct about measurements performed on the system starting with this initial state at = 0.
(1) If you measure immediately following another measurement of at = 0, both measurements of will yield the same value ℏ 2 .
(2) If you first measure ⃗ 2 at = 0 and then measure in immediate succession, the measurement of will yield the value ℏ 2 with 100% probability. with 100% probability.
(2) If you first measure ⃗ 2 and then measure in immediate succession, the measurement of will yield the value ℏ 2 with 100% probability. (1) The expectation value 〈 〉 depends on time.
(2) The expectation value 〈 〉 depends on time. Choose all of the following statements that are correct: (1) The expectation value 〈 〉 depends on time.
(2) The expectation value 〈 〉 depends on time. (1) If you measure the position of the particle at time = 0, the probability density for measuring is | ( − ) | 2 .
(2) If you measure the energy of the system at time = 0, the probability of obtaining 1 is (1) If you measure the position of the particle after a time , the probability density for measuring is | ( − ) | 2 .
(2) If you measure the energy of the system after a time , the probability of obtaining 1 is . Choose all of the following statements that are correct at time = 0: (1) If you measure the position of the particle at time = 0, the probability density for measuring is | 1 ( )+ 2 ( ) √2 | 2 .
(2) If you measure the energy of the system at time = 0, the probability of obtaining 1 is . Choose all of the following statements that are correct at a time > 0: (1) If you measure the position of the particle after a time , the probability density for measuring is | 1 ( )+ 2 ( ) √2 | 2 .
(2) If you measure the energy of the system after a time , the probability of obtaining 1 is