Validating the Japanese translation of the Force and Motion Conceptual Evaluation and comparing performance levels of American and Japanese students

Michi Ishimoto, Ronald K. Thornton, and David R. Sokoloff School of Environmental Science and Engineering, Kochi University of Technology, Tosayamada-cho, Kami-shi, Kochi 782-8502, Japan Departments of Physics and Education, Center for Science and Mathematics Teaching, Tufts University, Medford, Massachusetts 02155, USA Department of Physics, University of Oregon, Eugene, Oregon 97403, USA (Received 17 March 2014; published 19 August 2014)


I. INTRODUCTION
The use of standard examinations, such as the Force Concept Inventory (FCI) [1] and the Force and Motion Conceptual Evaluation (FMCE) [2], to test student understanding of physics concepts has led to the development of many effective instructional methods in the United States [3].In Japan, many physics instructors have attributed student learning to students' individual characteristics rather than to the instructional methods employed.These educators do not have the means to measure the efficacy of their instructional methods.The dissemination of concept inventories, such as the FCI and the FMCE, is a first step in improving instructional methods scientifically.Thus far, no original concept inventories have been developed in Japan.For this reason, the Japanese translation of established concept inventories in the United States is the fastest, most convenient way for Japanese educators to obtain concept inventories.Because of differences between the Japanese and English languages, and between the Japanese and American educational systems, it is important to assess the Japanese translation of the concept inventories, tests originally developed in English for American students.
In Japan, the subject of science has been taught in the Japanese language for 150 years.All technical terms have been translated into Japanese so that all Japanese can teach and learn science without learning foreign languages.We surmise that most Japanese physics teachers have limited English proficiency and do not have a full understanding of the nature of the concept inventories, which differ from regular physics problems.From about the year 2000 onward, several Japanese translations of the FCI (written in plain English) have been conducted.In general, the translators managed to produce a well-written introductory physics aptitude test but paid little attention to the distractors.They used their translated versions to calculate the normalized gain so as to evaluate their instructions.Beginning in 2009, several Japanese researchers in physics education worked to amalgamate the translated versions of the FCI in existence at that time and then uploaded the amalgamated FCI (we refer to it as the "FCIJ" in this study) to the Arizona State University Modeling Instruction Program website in 2011 [4].However, a translation does not guarantee an accurate transfer of the meaning of the original work.To the best of our knowledge, validation of the FCIJ is still required.
In Japan, the FMCE is not as widely used as the FCI owing to the longer time required in administering the FMCE.The time factor is especially important in high schools where more instructors are interested in finding ways to improve their instructional delivery.In addition, physics instructors using the FCI are not aware of the merits of using the FMCE for class assessment purposes.One of the authors of the present study translated the FMCE and has been using the translated version to evaluate the effectiveness of class instruction since 2003.We refer to this translated version as the "FMCEJ" in this study.We chose to evaluate the test quality and validity of the FMCEJ rather than the FCIJ for the following four reasons: (1) The FMCE provides a more detailed measure of student understanding by virtue of having a greater number of items covering a narrower range of topics to probe [5].(2) We have a large set of pretest data to validate.
(3) American data are available for comparison purposes.(4) The FMCE is in a clustering format, thus allowing detailed analysis.We used indices of the classical test theory to validate the translation because they are simple to calculate and are considered a standard procedure for ensuring the quality of the concept inventory [6].

II. METHODOLOGY
To be considered useful tools for assessing student learning, concept inventories have to meet the conditions of a high-quality test and elicit students' concepts quantitatively.We used the classical test theory indices to examine the test quality of the FMCEJ because the indices distinguish the quality of multiple-choice tests [6].We calculated the indices to determine if they satisfy the recommended criteria of a high-quality test [7].To examine whether the FMCEJ measures Japanese students' concepts with reasonable accuracy, we assumed that Japanese students and American students share the same concepts of motion and force before they receive formal instruction in physics.Based on this assumption, American and Japanese data should show similar responses on the pretest.If the similarity is reasonably close, the test is likely to be valid.Although test validity is usually examined by interviewing students, we did not establish interviewing methods for validation of physics concepts in Japan.Therefore, we decided to compare statistically our data from Japanese students with data from American students for validation.For the purpose of comparison with our data, we used Smith and Wittmann's [8] data to represent American data.

A. Students surveyed
For the purposes of the present study, the FMCEJ was administered to mostly first-year students attending a basic mechanics class at the Kochi University of Technology (KUT) between 2003 and 2012.This midsized public university has an engineering division that consists of three schools: the School of Systems Engineering (SE), the School of Information (I), and the School of Environmental Science and Engineering (ESE).We obtained data from 1095 students who answered all 47 questions on the FMCEJ.(The surveyed students were distributed among the schools as follows: 40% SE, 5% I, and 55% ESE, respectively.)Each year, more than 500 000 university candidates in Japan write the National Center Test (NCT) for University Admissions, a standardized academic aptitude test.On average, the KUT students surveyed for the present study earned a score of 63 points (of 100 points) on the NCT in physics, which is comparable to the national average of 62.68 points reported in 2013 [9].Thus, we infer that the data must be suitable for assessing the physics preconcepts of typical Japanese students.

B. Basic statistics of the FMCEJ for two kinds of scoring systems
The FMCE is designed to probe students' views of force and motion concepts using clusters with respect to subjects (kinematics, Newton's second law of motion, Newton's third law of motion, and energy conservation) and representation (natural language or graphical questions).According to Thornton and Sokoloff [2], an accurate evaluation could be obtained with 40 of the 47 questions.They omitted questions 5, 6, 15, 33, 35, 37, and 39.Thornton et al. [5] said that single-number scores (SNSs) could be useful in some circumstances and designed a SNS with a total of 33 points.(Their approach entailed excluding the portion on energy conservation, which consists of 4 questions.)They suggested that 9 of the 40 questions be divided into three sets of 3 questions, with a score of 2 points being obtained if all 3 questions in a set are answered correctly [ð2 points × 3Þ þ ð40 − 4 − 9 pointsÞ ¼ 33 points].
In the present study, we use their SNSs and 4 points from the 4 questions on energy conservation and refer to them as our SNSs.The inclusion of the 4 questions on energy conservation thus changes the possible total score on the FMCEJ to 37 points.The SNSs are well correlated to the scores of the simple sum scores (SSSs) for the 47 questions (Fig. 1).The SSS values overestimate the low-scoring students and underestimate the high-scoring students.Table I shows the statistical values of the SSSs and the SNSs questions.The score distributions of the SSS and the SNS are shown in Fig. 2.

Evaluation of the FMCEJ
Table II shows the evaluation of the analysis of the FMCEJ, based on the classical test theory [6].The difficulty index is the ratio of correct responses to the total number of responses for each item question.The average difficulty index over 47 items of the FMCEJ was 0.34 for the pretest, and the value was expected to rise for the posttest.The discrimination index determines the discrimination power of an individual test item.It measures the extent to which a single test item distinguishes students who know the material well from those who do not.The point biserial coefficient is defined as "a measure of consistency of a single item with the whole test.It reflects the correlation between students' scores on an individual item and their scores on the entire test" [7].As shown in Table II, the average of these indices of the FMCEJ is above the desired values [7].The reliability index KR-20 is a measure of the self-consistency of a whole test.Ferguson's δ is a measure of the discriminate power of an entire test, meaning that it reflects how broadly the total scores of a sample are distributed over a possible range.The reliability index and the discriminatory power of the pretest were 0.91 and 0.98, respectively.The index values indicate that the FMCEJ is a high-quality concept inventory test and is appropriate for evaluating individuals' understanding of concepts of force and motion.
Table III shows each item's difficulty index, point biserial coefficient, and discrimination index based on data from the top 25% of scorers and the bottom 25% of scorers.The columns labeled "MC" and "NA" in Table III show the FMCEJ distractor symbols for the most common sense  concepts (MCs) and the proportion of "no correct answers" (NAs).The results showing that KUT students' MC symbols agree with those of American students support our assumption that Japanese students and American students share the same concepts of motion and force before they receive formal instruction in physics.This similarity supports the FMCEJ as a valid form of assessment.The items with different MCs from those of the U.S. counterparts are items 23, 46, and 47 (presented in bold type).We discuss these three items in greater depth in the cluster section.The 10 blank spaces in the MC columns of Table III indicate that no MCs were probed.The first 7 blank spaces correspond to the 7 items considered to be irrelevant for listing MCs.According to Thornton and Sokoloff [2], the students' responses for these 7 items reflect something other than the intent of these questions, so we left them blank.The remaining 3 blank spaces correspond to velocity graph questions.Because the correct response rate was near 90% for these questions, there is no point to elicit MCs on these items.The classical test theory indices of the FMCEJ are above the desired values, and the MCs of the Japanese students, with the exception of the three items (items 23, 46, and 47), agree with those of American students.Thus, we regard the FMCEJ as a high-quality test that can be used to compare Japanese students' and American students' understanding of Newtonian concepts.

IV. ANALYSIS A. Difficulty level of clusters
Smith and Wittmann [8] used the following clusters to analyze FMCE data: force sled (FS), reverse direction (RD), force graph (FG), acceleration graph (aG), Newton's third law (N3), velocity graph (vG), and energy (KE).We used these same clusters to analyze our data.In the KE cluster, the same questions are posed about velocity and kinetic energy.Because the difficulty indices of the velocity questions are intrinsically lower (by about 10% and higher) than those of the kinetic energy questions, we suspected that some students answered the kinetic energy questions correctly without understanding the concept.To examine this hypothesis, we included the velocity energy (vE) cluster, a subset of the KE cluster, in our analysis in Table IV.
Table IV shows the average difficulty indices for the clusters in the group of students, with the corresponding SNS in the first column.The second column shows the number of students with the corresponding SNS, and the third column shows the population percentage  [2] as inaccurate for the assessment.The remaining 3 blank spaces correspond to velocity graph questions (items 40, 42, and 43); the correct response rate was nearly 90% for these items, and, therefore, the other responses were too minute to be listed.of the students whose scores are equal to or higher than all the students.The values for the seven clusters represent the average difficulty index of the group.The average difficulty indices as a function of the SNS by cluster are presented in Fig. 3.
The values in the FS cluster correlate linearly to the SNS.The set of five items in the FS cluster represents students' SNS values.The RD values are very selective; that is, only a low percentage of the population provided correct answers.The values drop sharply below the SNS of 30 points (81% score).The FG questions were difficult for students to answer correctly, with values below the SNS of 20 points (54% score).The aG questions appeared to be less difficult for students to answer correctly than the FG questions.The N3 questions were extremely difficult for students to answer correctly, and even students with high SNSs could not answer these questions well.This is consistent with results in English-speaking populations.The vG questions are of little use for the purposes of evaluation and assessment in this population because more than 90% of the students answered them correctly.Our finding that students with an SNS of 17 points answered more than 80% of the KE questions correctly indicates the ease with which the students answered these particular questions.However, the values for the vE questions are lower (not shown in Fig. 3), indicating that the difficulty indices for the KE questions included more false positives.

B. FS cluster
The FS questions, written in natural language, probe the relationship between force and motion.The MC is that an applied force is proportional to the velocity of the sled, the F ∝ v model.The corresponding proportions are shown in Table V in bold type.A very low value was determined for item 5 of the FMCEJ, a finding that is consistent with a "false-positive response" observed by Thornton and Sokoloff [2].Table V shows that in the case of an increase in the sled's speed (AEv up), students' responses reflecting their use of the F ∝ v model were as high as 70%.However, in the case of a decrease in the sled's speed (AEv down), students' responses reflecting their use of the F ∝ v model dropped to around 30%.Instead, students' responses reflecting their use of the opposite force against the velocity direction model (the brake model) increased to 70%.A comparison of students' responses in which the F ∝ a model was used in the case of a decrease in the sled's speed and in the case of an increase in the sled's speed revealed that the former was 10% higher (not the F ∝ a model).Therefore, in the case of a decrease in the sled's speed, we estimate that 20% of students used the F ∝ a model and another 20% used a variation of the brake model to make up the 40% response rate of the F ∝ a model.Because 20% of the response rate of the brake model is actually the F ∝ a model, the rate of the common sense brake model would be 40% (60%-20% ¼ 40%).Thus, we estimate that 20% used the F ∝ a model, 30% consistently used the F ∝ v model, and 40% used the brake model.Less than 1% of students answered all five items (items 1, 2, 3, 4, and 6) correctly, and 9% of the top scorers' average had a difficulty index above 80%.Thus, we infer that less than 10% of the students used the F ∝ a model consistently.

C. RD cluster
The three subclusters include a set of three questions asking about the net force of an object on an inclined surface and the force of an object with free fall motion and its acceleration.The motion of the object includes the point where the direction of the motion reverses.According to Thornton and Sokoloff [2], students are considered to provide correct answers only when all three questions within a given subcluster are answered correctly.Smith and Wittmann [8] suggested the direction-of-motion model (F∥v), which includes the F ∝ v model.Table VI shows that 10% of the students used the F ∝ a model, 60% used the F ∝ v model, and 80% used the F∥v model to describe the net force of an object on an inclined surface (items 8, 9, and 10).A comparison of students' responses revealed that the percentage of correct responses was 5% lower for questions on the net force acting on a toy car on a ramp (items 8, 9, and 10) than for those on the force acting on a tossed coin (items 11, 12, and 13).The percentage of correct responses to questions on the acceleration of a tossed coin was higher than that on force.Because the concept of acceleration is more abstract than the concept of force, students' prior mathematics knowledge (i.e., acceleration as the derivative of velocity) might have contributed to this false positive.Only 52 of 1095 students answered the questions posed in all nine items correctly.The percentages of perfect correct responses for the sets of three questions were 6.7%, 6.5%, and 12.1% for the ramp, coin toss force, and coin toss acceleration subclusters, respectively.The highest percentage of correct responses in  the acceleration subcluster might be attributed to students' prior mathematics knowledge.
Students' responses reflecting their use of the F ∝ a model to describe the force acting on a tossed coin that is at its highest point may be considered true Newtonian responses.If so, students' responses reflecting their use of the same model to describe the force on a tossed coin moving upward include false positives; a small percentage of these false positives may be attributed to students' use of the brake model.Students' responses reflecting their use of the F ∝ a model to describe the force acting on a tossed coin moving downward include false positives.Ten percent of these false-positive responses are likely attributed to students' use of the F∥v model.Because the F∥v model includes the F ∝ v model, the estimated percentage of the students who used the F∥v model was near 80% in the case of slowing down with the upward motion (items 8 and 11), whereas 30% was observed in the FS cluster (items 3 and 7).The influence of context may explain this difference.The brake model is brought to mind in the case of applying force by human action in the FS cluster.Table VI shows that 80% of students used the F∥v model and that 60% used the F ∝ v model, which is included in the former.The percentage of students who correctly answered the questions for all items in this cluster was only 5%.This is an expected result for this population.

D. Graphical representation (FG, aG, and vG)
For the FG questions, students were asked to refer to the force-versus-time graph corresponding to the horizontal motion of a toy car.The concepts probed here are identical to those of the FS cluster, the only difference being the graphical representation.
The Newtonian response is that an applied force is proportional to the rate of its velocity change, the F ∝ a model.Students' responses reflecting their use of the F ∝ a model were consistently around 15% for five items in the FG cluster (Table VII).This finding is in contrast to students' responses reflecting their use of the F ∝ a model in the FS cluster, which varied from 16% to 41% (Table V).The corresponding proportions are shown in Table VII in underlined type.
The MC is that an applied force is proportional to the velocity of the toy car, the F ∝ v model, which is similar to those of the FS cluster.The corresponding proportions are shown in Table VII in bold type.In the case of constant speed, students' responses reflecting their use of the F ∝ v model were as high as 80% and their use of the F ∝ a model were 13%; these students' responses were very F increase  c The direction and magnitude column, "v" represents the direction of motion to the right, and "−v" to the left.similar to those in the FS cluster.In the case of an increase in the toy car's speed (AEv up), students' responses reflecting their use of the F ∝ v model were as high as 80%, similar to those in the FS cluster.
In the case of a steadily decreasing positive velocity, 14% of students (not shown in Table VII) chose the force presented by a straight declining line with a positive initial value (right direction) and a negative final value (left direction).We surmise that the students who chose this answer had decided to use a mixed model comprising the F ∝ v model and the brake model.Students' responses reflecting their use of the F ∝ v model dropped to around 56% and more (depending on how the 14% of students described earlier are classified), which is in contrast to 35% in the FS cluster (Table V).Students' responses reflecting their use of the opposite force against the velocity direction model (the brake model) decreased to 9% and more (again, depending on how the 14% students described earlier are classified), which is in contrast to 40% in the FS cluster.In either case, students used the brake model to a lesser extent in the FG cluster.We speculate that the abstract representation of graphics might encourage students to respond with more consistency and less intuition.
For the aG questions, students were asked to refer to the acceleration-versus-time graph corresponding to the horizontal motion of a toy car.Students' responses reflecting their use of the a ¼ dv=dt model (the correct model) was consistently around 30% for five items (Table VIII).Compared with the correct response rate of the FG cluster, the rate was generally up by 15%.Students' responses reflecting their use of the a ∝ v model (the MC response) was consistently around 50%.On item 23 (a decrease in the toy car's speed), the Japanese students surveyed for the present study most commonly used the a ∝ v model (52%), whereas Smith and Wittmann [8] found that American students most commonly used the brake model.The brake model was used in only 10% of responses by students at KUT.The students at KUT consistently favored the a ∝ vðF ∝ vÞ model.On item 3 in the FS cluster, Japanese students favored the brake model as much as did American students.We surmise that the students' use of math knowledge (the a ¼ dv=dt model) explains the consistency.
In the vG cluster, a correct response rate of 90% was determined for all questions except for item 41 (leftward motion), which had a correct response rate of 80%, and the v ∝ x model, which had a response rate of 8%.In general, the correct response rate was a few percentage points lower for questions involving leftward motion.We suspect that the unfamiliar situation (leftward motion) might have influenced students to answer incorrectly.
In general, we surmise that the graphical representation of questions encouraged the Japanese students to recall their mathematics knowledge and to use abstract ideas to a greater extent than did questions written in natural language.

E. N3 cluster
The N3 cluster comprised questions about two different situations (pushing and colliding) in which a truck and a car in contact apply a force on each other.The correct response rates for questions 36 and 38 about a small compact car pushing a large truck that has broken down were less than 7%, whereas the correct response rates for questions about a truck and car colliding were 17% and higher.The fact that Japanese students are not familiar with the situation of a vehicle pushing another vehicle that has broken down might have influenced the response rates.
Item 30 refers to a situation where a car and a heavy truck traveling at the same speed collide with one another.The inclusion of this item on the FMCE is to determine students' use of the mass dependent model.In our survey, 80% of the students used the mass dependent model and 17% used the N3 model to answer the question posed in item 30.Item 34 refers to a situation where a car collides with a stopped truck with the same mass as the car.This item is included on the FMCE as a way to determine students' use of the action dependent model.In our survey, 41% of the students used the action dependent model, 34% used the N3 model, and 18% provided uncertain answers   c In the direction and magnitude column, "v" represents the direction of motion to the right, and "−v" to the left.
for the question posed in item 34.Because the students used the mass dependent model and the action dependent model together, the percentage of uncertain answers was 50% in item 31, which refers to a collision between a slow-moving, heavy truck and a fast-moving, light car.According to Smith and Wittmann [8], American students were relatively consistent in their use of the action dependent model.We estimated that 17% of the students in our survey used the N3 model for items 30 through 34.
Items 35 through 38 refer to a situation in which a small compact car pushes a heavy truck that has broken down on the road.The following two scenarios are given: (1) the car, pushing the truck, speeds up and (2) the car, pushing the truck, slows down when the truck puts on its brakes.In our survey, 80% of the students attributed the car's increase in speed to the car exerting a stronger force on the truck, and 60% attributed the car's decrease in speed to the truck exerting a stronger force on the car.In either case, the students used the action dependent model, as Smith and Wittmann [8] previously pointed out in their analysis of the FMCE.The most common responses on items 36 and 38 in our survey were "c" and "b," respectively, which agree with the American responses.We estimated the Newtonian responses to be 6% (Table IX).The difficulty indices of items 30, 31, 36, and 38 are very low, as are the point biserial coefficient and discrimination index values (Table III).The level of difficulty of these questions is shown in Fig. 3.After effective instruction, many more students answered these questions correctly.As seen in the students' responses for item 31, a large number of the students used both the mass and action dependent models.Only 3% of the surveyed students correctly answered all the questions in the N3 cluster.

F. KE cluster
Items 44 through 47 refer to a situation in which a sled at rest at the top of a hill is released.The questions deal with the speed and KE of the sled after sliding down the hill.For items 44 and 45, the two hills have the same height but different steepness.The MC models are the slope dependent model (answer option "a") for the speed questions and the correct response (answer option "b") for the KE questions.If the students who stated that there was insufficient information to answer the questions posed in items 44 and 45 (answer option "d") are regarded, it can be presumed that 57% of the students attributed the greater speed of the sled to the steepness of the hill and that 40% attributed the greater KE of the sled to the steepness of the hill.
Items 46 and 47 refer to a situation in which a sled is released from rest at the top of a hill that is higher and less steep than the original hill, thereby adding the factor of height to the steepness.As shown in Table X, the most common responses were the correct responses for both the speed (42%) and the KE (52%) of the sled after sliding down the hill.Based on the sum of incorrect answers, it can be presumed that 58% of the students incorrectly attributed the greater speed of the sled to the steepness of the hill and that 48% of the students incorrectly attributed the greater KE of the sled to the steepness of the hill.The percentages are consistent for both cases.
In our survey, those students who answered the questions posed in items 46 and 47 favored the steepness model in item 46 (greater speed of sled) to a larger extent than in item 47 (greater KE).We suspect that speed is a less abstract idea than KE, which cannot be seen in everyday life.Thus, the true proportion is likely closer to the students' responses regarding the speed of the sled rather than the KE of the sled.
The most common incorrect response (around 20%) provided by the Japanese students for the questions posed in items 46 and 47 was that insufficient information was given to answer the questions (answer option "d").By contrast, the most common incorrect response given by American students was slope dependent (answer option "c"); that is, they incorrectly attributed the higher speed of the sled to the steepness of the hill.
Table XI shows the relationship between the sled's speed or KE and the conditions of the hill.The ratio of the intersection correct response of "steep" hill and "higher" hill (items 44 through 47) to the correct response of steep hill (items 44 and 45) is 96%, whereas the ratio to the correct response of higher hill (items 46 and 47) is 83% among the SNS high scorers.The ratios decrease to 50% and 21%, respectively, among the low scorers.The difference sets for the higher hill minus steep hill questions for the students with middle and low SNS values are 32% and 79%, respectively.Thus, the correct response of steep hill contains a higher ratio of understanding of energy conservation, especially for students with middle to low SNSs.
The ratio of the intersection correct response of steep hill and higher hill to the correct response of velocity (items 44 and 46) is 86%, whereas the ratio to the correct response of KE (items 45 and 47) is 91% among the SNS high scorers.The ratios are 38% and 25%, respectively, among the low scorers.The difference sets for the "KE" minus "velocity" questions for the students with middle and low SNS values are 25% and 75%, respectively, thus indicating that both middle and low scorers did not have a firm grasp of this concept.The ratio of the correct response to steep hill questions is lower than that to the higher hill questions, and the ratio of the correct response to speed questions is lower than that to KE questions for low scorers.
The high scorers had a firm grasp of the concepts of energy conservation, whereas the middle and low scorers used the correct concept conditionally.The questions about the sled's speed appear to be better indicators of students' understanding because there were fewer false-positive responses in general.The steep hill questions also included fewer false-positive responses among most students.Of the students surveyed, 25% correctly answered all four questions on energy conservation, indicating that they possessed a good understanding of its concepts.This value is substantially higher than the percentage of surveyed students considered to be Newtonian thinkers (<5%).Of the four items on energy conservation, item 44 is the most reliable estimate of correct energy concepts because this item had the lowest number of false positives.Item 44 also had the highest point biserial coefficient and discrimination index values in the cluster (Table III).

V. PRECONCEPTS: SUMMARY
In our survey, we found the apparent proportions of Japanese students who used the Newtonian F ∝ a model, the F∥v including F ∝ v model, and the brake model are roughly 10% to 20%, 40% and higher, and 40%, respectively.Although the students used a particular model depending on the context of the question, the percentage of students who used the F ∝ a model consistently is estimated to be less than 5%.The graphic representation produces more consistent use of the concept model among the students.
The common sense concept model most frequently used by students to answer questions on Newton's third law was the action dependent model, followed by the mass dependent model.The proportion of students with a good understanding of Newton's third law model is estimated to be less than 5%.The number of students who answered correctly on both "steep" hill and "higher" hill questions. c The number of students who answered correctly on both "velocity" and "kinetic" energy questions.
The common sense concept model most frequently used by students to answer questions on energy conservation was the steepness dependency model (three-quarters of the students used this particular model).The percentage of students with a good understanding of the energy conservation model is estimated to be less than 25%.
We conclude that, much like the situation that has been described in American high schools, only a small percentage of students graduating from high school in Japan are Newtonian thinkers.

VI. DISCUSSION AND CONCLUSIONS
This extended and detailed analysis of the responses of 1095 "average" Japanese students (as indicated by national testing) on the FMCEJ indicates that these students responded to multiple-choice questions (items) with keys (correct answers) and distractors (incorrect answers) in a manner consistent to the views found among American students.Basic statistics and the classical test theory indices of the FMCEJ indicate that its reliability and discrimination are appropriate to assess Japanese students' concepts about force and motion.The preconcepts of Japanese students assessed with the FMCEJ are quite similar to those of American students assessed with the FMCE, thereby further supporting the validity of the FMCEJ.
We, unfortunately, could not address the subtle differences between the responses of American students and those of Japanese students on a few items.We hope that a future collaborative study using a database containing item-level responses for each American student may provide insight into these differences.
Our next study will be a comparison of the data of two groups of Japanese students-students who have received instruction via the traditional lecture format and those who have received instruction via interactive engagement-in terms of pretest scores, NCT scores, and gender.
Finally, this simple way of evaluating a translated concept inventory may help in validating the quality and compatibility of translations, thereby making cross-cultural comparisons more reliable in the future.

FIG. 1 .
FIG.1.The correlation between the SSSs and the SNSs of the FMCEJ.The correlation coefficient is r ¼ 0.99 (N ¼ 1095 students).The solid line shows a best fit (regression) to the data.

FIG. 3 .
FIG.3.The average difficulty index in each cluster with respect to the SNS.The values in the FS cluster correlate linearly to the SNS. b

TABLE I .
Basic statistics for two scoring systems: simple sum score and single number score.

TABLE II .
Evaluation of the FMCEJ written by Japanese students.

TABLE III .
(5,6,culty index (p), point biserial coefficient (PBC), discrimination index (D), and MC of the FMCEJ.The D value is based on data from the top 25% of scorers and the bottom 25% of scorers.bThe 10 blank spaces in the MC columns of TableIIIindicate that no MCs were probed.The first 7 blank spaces correspond to the 7 items(5,6, 15, 33, 35, 37, and 39) regarded by Ref. a

TABLE IV .
Average difficulty index in clusters in terms of SNS.

TABLE V .
Response rates for force sled question (FS).
⇀ F increase F constant F decrease F ¼ 0 −F increase −F constant −F decrease Item No. F ∝ aF∥v Brake a The corresponding proportions of the F ∝ v model are shown in Table V in bold type.b The corresponding proportions of the F ∝ a model are shown in Table V in underlined type.c Item 5 is inaccurate for assessment, according to Ref. [2].

TABLE VI .
[8]ponse rates for reversing direction force and acceleration.F∥v is the direction-of-motion model suggested by Ref.[8], which includes the F ∝ v model. a

TABLE VII .
Response rates for force graph.

TABLE VIII .
Response rates for acceleration graph.
The corresponding proportions of the F ∝ vmodel are shown in Table VIII in bold type.b The corresponding proportions of the F ∝ a model are shown in Table VIII in underlined type. a

TABLE IX .
[2]ponse rates for Newton's third law.aM and m are the masses of the truck and the car, respectively, V and v are the speeds of the truck and the car, respectively, and F and f are the forces exerted by the truck and the car, respectively.bItems35and 37 are inaccurate for assessment, according to Ref.[2].
aThe proportion of the correct responses are shown in bold type.

TABLE XI .
Robustness of the concept of energy conservation with respect to the SNS level.