Using the Rasch model to analyze the test of understanding of vectors

Ana Susac, Maja Planinic, Damjan Klemencic, and Zeljka Milin Sipus Department of Applied Physics, Faculty of Electrical Engineering and Computing, University of Zagreb, Unska 3, 10000 Zagreb, Croatia Department of Physics, Faculty of Science, University of Zagreb, Bijenicka 32, 10000 Zagreb, Croatia Department of Mathematics, Faculty of Science, University of Zagreb, Bijenicka 30, 10000 Zagreb, Croatia

The PER community has adopted different approaches to data analysis of multiple-choice questions [18].Widely used physics diagnostic tests are often evaluated in multiple ways, e.g., using classical test theory, item response theory, factor analysis, or Rasch analysis [19][20][21][22][23][24].In the case of the test of understanding of vectors (TUV), the classical test theory was used for evaluations during the development stage and for the analysis of the final version [17].The three-parameter logistic model of item response theory (IRT) and item response curves technique were employed to analyze the TUV when it was released [25].
The Rasch model is another useful tool for the evaluation of tests that are intended to be used as assessment instruments [18].As the Rasch model-based analysis of the TUV has not yet been performed, we decided to reevaluate the TUV using that approach.
In this paper we aimed to answer the following research questions: (i) How does the TUV function on a sample of firstyear engineering and science students?(ii) What difficulty scale of vector concepts is suggested by students' scores on the TUV?We performed the Rasch model-based analysis of the TUV to address these research questions.

A. Participants
The study included 889 undergraduate first-year engineering and science students from the University of Zagreb.A detailed description of the sample is given in the Supplemental Material [26].

B. Data collection
The test of understanding of vectors [17] was translated in Croatian and validated by two university lecturers in physics and one university lecturer in mathematics.The test was administered to the participants during their second semester at the university.

C. Data analysis
We used the Rasch model to analyze the collected data.The Rasch model is an important psychometric tool when conducting science education research utilizing multiplechoice tests [27].For an introduction to Rasch analysis see, for example, Ref. [28].We used the Winsteps software [29] to conduct the Rasch analysis.

A. Analysis of test structure and functioning
The Rasch analysis of the collected data showed very high item reliability (0.99), which indicates replicability of item order according to difficulty if the test were administered to another similar sample of students.The person reliability was rather low (0.60), which indicates a not very reliable person order if a similar test was administered to the same students.The person reliability reports the reproducibility of the person measure order obtained through Rasch analysis.This index can range from 0 to 1, but its minimum meaningful value is 0.5, whereas the lowest person reliability for any decision making about students' abilities (e.g., discernment between high and low performers) is 0.8.The most effective way to increase the person reliability is to increase the number of test items [30].The Cronbach alpha was 0.79, which can be considered satisfactory.The obtained value corresponds to the value of the Kuder-Richardson reliability index (0.78) reported by Barniol and Zavala for their sample of students [17].
Figure 1 shows the distributions of student abilities and item difficulties on the same logit scale.One logit is the distance along the scale that increases the probability of observing the event specified in the measurement model by a factor of 2.718 (the base of the natural logarithm e).The average item difficulty is set at zero.More able students and more difficult items have a more positive value of Rasch measure.Figure 1 reveals that the mean student ability is about 2 logit above the mean item difficulty, which indicates a rather poor test targeting.The test is too easy for our sample of first-year university students.A significant fraction of students, 11.5% (102 students), solved all test items correctly, and an additional 13.3% (118 students) solved 19 test items correctly.About half of the students had abilities outside the range of item difficulties.The test does not contain items that would help to better estimate the abilities of these students.Consequently, the person reliability is quite low, as reported above.
If the test is meant to be well targeted, then the distribution of item difficulties should be aligned with the distribution of student abilities.Ideally, items should be distributed more or less evenly along the whole range of student abilities, since student ability is best evaluated by items in the AE1 logit interval around the ability value.For the sample of students in this study, the TUV has enough easy items, but it does not have enough difficult items.There are no items centered on students who are in the FIG. 1.The person-item map shows the distribution of student abilities on the left-hand side and the distribution of item difficulties on the right-hand side displayed on the same logit scale.Each "hash" represents four students and each "dot" represents one to three students.The TUV items are labeled as Q1-Q20.M denotes the mean of each distribution; S denotes 1 standard deviation, and T 2 standard deviations from the mean.ability range of 2-5 logit.These results indicate that the TUV would benefit from including more difficult items.
To evaluate specificities of different student populations, we conducted the Rasch analysis of the data collected from subpopulations in our sample and the obtained person-item maps are shown in the Supplemental Material [26].The results show that the TUV had good targeting only for the subsample of students from the Faculty of Chemical Engineering and Technology.

B. Analysis of test items
Furthermore, we examined the fit of the items with the Rasch model by calculating infit and outfit mean square (MNSQ) residuals and standardized Z scores for each item (Table I).Typically, items are considered to have acceptable fit if both infit and outfit MNSQ are between 0.7 and 1.3, and Z scores are between −2 and 2 [28], but items with MNSQ in the range 0.5-1.5 will still be productive for measurement [30].High infit values of MNSQ and Z scores indicate that students do not respond in the expected way to items whose difficulties correspond to their ability.High outfit values of MNSQ and Z scores usually indicate outliers (e.g., a student of lower ability answers correctly on a very difficult test item).Low infit and outfit values of MNSQ and Z score indicate overfit to the Rasch model, i.e., the data are more predictable than the model expects, and the items therefore do not provide much new information, but they do not degrade the measurement.
Table I shows that most items fit well with the model.The item that is problematic to a certain extent, because of a larger misfit, is item 14.Rather high outfit values of its MNSQ and Z score were largely caused by some students of higher ability who unexpectedly failed on this test item.This agrees with the results from the previous study by Rakkapao et al. [25], where rather low discrimination power was found for this item.Item 14 refers to calculation of the x component of a vector when the angle is measured from the y axis.The majority (70%) of students who failed on this item chose the distractor in which the sine function was replaced by cosine.This might be caused by students' lack of attention (angle was measured from the y axis) or their difficulties with trigonometric functions.Overall, item 14 should be further examined and possibly revised in a future version of the TUV.Items 17 and 12 also show high values of Z score, but their MNSQ values are acceptable.Since Z scores are too sensitive for large samples [30], items 17 and 12 can be considered productive for measurement.
The Rasch model assumes unidimensionality, i.e., the existence of a single underlying measurement construct.The point-measure correlations can help identify the existence of the construct.Positive point-measure correlation of an item indicates that the item is in line with the measured construct.The size of correlations shows how much the items contribute to the construct.Table I shows that the point-measure correlations of all items are positive, which suggests that all items measure the underlying construct (understanding of vectors).

C. Difficulty scale of vector concepts
The TUV is developed to test ten vector concepts used in the introductory physics courses [17].Each vector concept was tested by one to three TUV items.To compare the difficulties of the TUV items related to each vector concept, we calculated their average values and their uncertainties (Fig. 2).
The most difficult vector concept appears to be the unit vector.This concept was tested by only one test item (item 2).This finding is consistent with the results from the previous studies [17,25].For the group of Thai students, item 2 was the most difficult TUV item, while it was the third most difficult item for the group of Mexican students.The most popular distractor in item 2 in all three studies (see Refs. [17,25] and the present study) was the choice in which the x and y components were unit vectors.Barniol and Zavala [14] and Rakkapao et al. [25] found that the students who have chosen this answer believed that this vector had a magnitude 1.So, we might conclude that many students who failed to select the correct answer on item 2 probably knew the definition of unit vector but they did not correctly use the notion of vector magnitude and vector decomposition.To better examine student understanding of the unit vector, it would be useful to have more test items on this vector concept.
The next two concepts according to difficulty are cross product and subtraction of vectors.Two items on cross product were above the average difficulty, and one item was close to the average.Item 12 (geometric interpretation of the cross product as a perpendicular vector satisfying righthand rule) and item 15 (calculation of the cross product of vectors written in unit-vector notation) were among the most difficult items, whereas item 18 (identifying the correct formula for cross product magnitude) was considerably easier.Both items 12 and 15 refer to the direction of the vector product, and the most frequent incorrect answers on both items were vectors with the opposite direction to the correct vector, which possibly indicates the misapplication of the right-hand rule.This finding is in agreement with the previous studies on student difficulties with cross product direction [13,16,31].
Subtraction of vectors is significantly more difficult than the addition of vectors, and that was corroborated by the previous reports [17,25].For our sample of students, subtraction of vectors in one dimension (item 19) was more difficult than subtraction in two dimensions (item 13).Similar result was reported for the group of Thai students [25], while Mexican students had more difficulties with subtraction of vectors in two dimensions [17].Heckler and Scaife [32] found that many student difficulties with simple vector addition and subtraction lie with the arrow representation itself, so they suggested the introduction of other representation (e.g., unit vectors ijk notation) together with the arrow notation.
The last two vector concepts that were more difficult than the average are dot product and vector direction.Student understanding of the vector direction was tested by item 5 (choosing a vector with the same direction as the given vector from among several options) that was easier than the average, and item 17 (calculation of direction of a vector written in unit-vector notation) that was the third most difficult item in the test (Fig. 1).Furthermore, the dot product was easier than the cross product for students in this study, similar to the results in the previous study by Rakkapao et al. [25].Barniol and Zavala found the opposite; for their sample of students, the cross product appeared easier than the dot product [17].This might be caused by the fact that they were tested after the course on electricity and magnetism in which the cross product of vectors is often used.Anyway, the dot product is among the more demanding vector concepts in all three studies.For the sample of students in our study, item 6 (identifying correct formula for a dot product) was significantly easier than items 3 (geometric interpretation of a dot product as a projection) and 8 (calculation of a dot product of vectors written in unit-vector notation).This suggests that knowing the formula is not enough for problem solving [33,34].
The remaining vector concepts (scalar multiplication of a vector, vector addition, graphic representation of a vector, vector components, magnitude of a vector) were below average difficulty, and did not pose a problem for most of the students.

IV. CONCLUSION
The Rasch analysis of the test of understanding of vectors showed rather good functioning of test items.Only item 14 deserves further inspection and possibly revision.However, the TUV was not well targeted to our sample of first-year engineering and science students in introductory physics courses.The mean of the distribution of item difficulties was about two logit below the mean of the distribution of student abilities indicating that the test was too easy for our sample.The lack of more difficult test items resulted in low person reliability.Nevertheless, the analysis of the data from subpopulations in our sample showed that the TUV can be well targeted for certain FIG. 2. Average difficulties of the vector concepts, evaluated by the TUV, measured in logits.
student populations [26].Furthermore, its functioning might depend on educational systems and teaching of vector concepts.For example, our sample of first-year engineering and science students in Croatia had better results on the TUV than Mexican and Thai students from the previous studies [17,25].The possible reason for the observed difference might be that Croatian students learned about some of the vector concepts tested in this study already in high school.
One could argue that the understanding of vector concepts is the essential prerequisite for attending introductory physics courses and that high student scores on the TUV are thus expected.In that case the TUV could be applied as a pass or fail assessment instrument.High scores on the TUV would indicate that most students acquired good understanding of vectors, and that they do not have difficulties with vector quantities and operations with vectors.Unfortunately, from our teaching practice, we could not confirm that is the case.Students in our introductory courses beyond mechanics still struggle with some vector concepts, such as cross and dot products.Thus, taking into account the results of the Rasch analysis and our insight in student difficulties with vector quantities in physics, we suggest adding more difficult items in the TUV.
Besides adding more items on the vector concepts already included in the TUV, but with only one item (e.g., unit vector), it might be beneficial to include some other vector problems often used in introductory physics courses (e.g., vector decomposition in nonorthogonal directions, adding or subtracting vectors that do not have the same initial point, finding the difference of equally long vectors with opposite directions, or, when given a resulting vector of a cross product and one factor, finding the possible directions of the unknown factor).These are the examples of the lessstandard vector problems encountered in introductory physics, which may present difficulty for students.It is our experience that students indeed do have difficulties applying their knowledge of vectors to such situations, and that is something that should be detectable by TUV and similar diagnostic instruments.
According to the results of the Rasch analysis, it seems that the most difficult vector concepts tested by the TUV is the unit vector, followed by the cross product, subtraction of vectors, dot product, and the direction of a vector.The order of vector concepts by difficulty might help instructors in introductory physics courses to put more emphasis on these concepts in their teaching in order to help students to overcome the observed difficulties.

TABLE I .
Item difficulty measured in logits, Rasch standard error, infit and outfit MNSQ (mean square residuals) and Z scores, and point-measure correlation for each TUV item.