Using action research to improve learning and formative assessment to conduct research

The paper reports on how educational research informed and supported both the process of refinement of introductory physics laboratory instruction and student development of scientific abilities. In particular we focus on how the action research approach paradigm combined with instructional approaches such as scaffolding and formative assessment can be used to design the learning environment, investigate student learning, revise curriculum materials, and conduct subsequent assessment. As the result of the above efforts we found improvement in students’ scientific abilities over the course of three years. We suggest that the process used to improve the curriculum under study can be extended to many instructional innovations.


I. INTRODUCTION
This paper describes how the physics education research ͑PER͒ group at Rutgers University has developed and refined curricular materials for the laboratory component of "Physics for the Sciences," which is an introductory algebra-based course for science majors.Of equal importance, we report on how educational research informed, supported and was used to evaluate our progress.Over a period of several years we developed a student-centered curriculum, ISLE ͑Investigative Science Learning Environment͒ and the corresponding active learning environment to engage students in guided inquiry. 1 In parallel with the design and improvement of this course, we investigated student learning, and the practices and interventions that facilitate it.
The work on designing and implementing the ISLE curriculum is deeply intertwined with our research on knowledge construction and facilitation, as both the course and the group's insights related to science learning have developed concurrently and their influences have been mutual.The course has provided the opportunities to investigate student science learning in a real context, thus giving ecological validity to our findings.Besides, this course with its particular demands and goals, its academic context, and student population has shaped the group's research.Reciprocally, we have built "Physics for the Sciences" based on the results of our research and, furthermore, these results have guided us in the improvement of ISLE-a physics learning system where students construct their knowledge and, at the same time, assimilate the practices of the scientific community by emulating the work of scientists. 1,2n this light, this substantial part of the work of the PER group at Rutgers can be framed in the "action-research" tradition where educators create and improve teaching practices and curricula in repeated cycles of inquiry and interventions.In the PER community an example of adhering to the actionresearch paradigm in curriculum development is the process followed by the University of Washington's PER group which had an established and well-documents practice of developing physics curriculum based on research findings. 3,4his paper focuses on the relationship between modifications of curriculum materials and student learning.6][7] Thus the work we present here is different from "traditional" studies of the relationship between research and learning which mostly focus on student learning of physics content.
With this study we seek to answer two types of questions.Questions 1-3 are generic questions that relate to any instructional innovation and questions 4-6 relate to the investigation of improvements of lab instruction.
Research questions: ͑1͒ How does the "action-research" approach to the revision of the curriculum material work in practice?
͑2͒ How might instructors manage the introduction of new curricular materials?Should the changes be made and implemented simultaneously or progressively?
͑3͒ What happens to the quality of student work when the students are required to put more effort towards a particular aspect of learning?
͑4͒ What can be done to improve the quality of students' experimental investigations?
͑5͒ What are some of the steps that instructors may take to better support students' lab work?
͑6͒ When does instructor help become inefficient or counterproductive?

II. THEORETICAL FRAMEWORK
How does one develop a successful instructional innovation?Following the design sequence proposed by Dick and Carey, we consider that deciding the goals of instruction should be the first step. 8The next step is to determine the abilities and behaviors needed to achieve the chosen goals.We then establish the performance objectives and choose the assessment tools.Only at this point do we proceed with the development of instructional strategies and materials that will help us attain the goals.To ascertain how well the intervention works, we need to conduct formative assessments and revise the instructional materials accordingly.The process of trying and modifying strategies and materials usually needs to be repeated several times in order to attain the desired results.
We find an analogous sequence of steps both in the action research-approach and in the formative assessment approach.Below we will show that the two can be combined together when we develop curriculum materials.

A. Action research
Action research is an inquiry tradition that researchers of different social sciences follow in order to understand complex situations and interactions, and to improve practice.In the action-research paradigm, the practitioners are at the same time the researchers.The knowledge product emerges as they collect data, and formulate and refine hypotheses.Therefore, they improve their practices and advance knowledge, bridging theory and practice at the same time.Kolb ͑1984͒ described action research as a learning cycle through which people construct knowledge by analyzing social contexts and learners performance, create theoretical models that reflect their experiences, and test these models in new situations. 9

B. Formative assessment
In their seminal paper, Black and Wiliam showed that formative assessment is one of the most effective educational interventions. 10Formative assessment is assessment of student learning in which the goal is to provide feedback for the immediate improvement of the teaching and learning process ͑pp.7 and 8, Ref. 10͒.In their later book, Black and colleagues 11 expanded ideas of formative assessment in a theoretical framework called AFL-assessment for learning, in which the primary goal of assessment is to promote student learning.Within this framework, instructors modify their teaching, and students modify their learning based on the feedback provided in the assessment activity.Without changes in the instructional process based on the student feedback, the process of formative assessment is not complete.If teachers do not revise their instruction based on student feedback, the improvement does not occur.
As we see from the above descriptions, the AFL framework and the action-research framework have the same pattern.Therefore we can integrate them together to simultaneously refine our understanding of student learning and improve that learning.In this paper, we will show how this theoretical framework can be applied to a specific example of scaffolding of the development of scientific abilities in introductory physics labs.

C. Scaffolding
If one of the goals of instruction is to help students think like scientists, then we need to recognize that the thinking processes and the procedures used by scientists are very demanding for students because they require the exercise of a large and complex set of methods and practices that we call scientific abilities.Facing these considerable difficulties, laboratory instructors can opt to make things easier by telling students what and how to do ͑this is done in traditional in-structional laboratories͒.However this approach of presenting learners with an oversimplified version of scientific experimentation deters them from understanding and seeing the value of scientific reasoning and strategies. 12e believe that students can successfully engage in meaningful experimental investigations if we provide them with the necessary support and learning opportunities.This support comes in the form of scaffolding.4][15] The support is then gradually withdrawn, so that the learners assume more responsibility and eventually become independent.
In our ISLE design labs students often encounter unfamiliar and complex tasks that require them to work in the zone of proximal development. 16Vygotsky's zone of proximal development is "the distance between the level of an individual's independent problem solving and her or his problemsolving capabilities, under guidance or in collaboration with peers" ͑p.86͒.In such situations, scaffolding becomes a necessary and crucial part of instruction.The second function of scaffolding applies to ISLE labs as well.The labs include rubrics, prompts, questions, and exercises that are meant to help students construct an understanding of the principles underlining scientific investigations, and therefore to prepare them for generating new knowledge and solving new problems.One of our major goals is that when students leave the physics lab, they retain this understanding and are able to activate their scientific abilities in different contexts.Hmelo-Silver distinguished between two types of scaffolding: blackbox and glass-box scaffolding.Black box scaffolding facilitates learning by performing the task for the learner, without needing the learner to understand the process.In contrast, glass box scaffolding makes implicit processes explicit ͑allowing the learners to understand what support they receive and why they need it͒, so that ultimately they internalize the scaffolding and become independent learners. 17The scientific abilities rubrics that we provide in ISLE labs are a clear example of glass-box scaffolding.
Holton and Clarke classified scaffolding by agent ͑expert, peer, or self-scaffolding͒ and by domain ͑conceptual or heuristic scaffolding͒. 18To promote the ultimate goal of learners' independence, instruction must gradually shift the source of scaffolding from the teacher ͑or expert͒ to the learners.Learners' self-scaffolding is virtually equivalent to their own construction of knowledge, sustained by metacognitive thinking.Reassigning the charge of providing scaffolding from experts to learners opens the possibility of learners progressively taking responsibility and directing their own learning.This shift in the agent who delivers scaffolding is possible through peer scaffolding during collaborative work ͑also referred to as reciprocal scaffolding͒ as well as instructors' heuristic scaffolding.Heuristic scaffolding refers to the support that enhances learners' development of their own approaches and procedures for learning or problem solving, while the purpose of conceptual scaffolding is to provide assistance in the acquisition of new concepts.For example, when students are solving a problem involving Newton's second law, we can ask them to draw a force diagram.This is heuristic scaffolding.When we ask them whether the magnitude of the normal force in this particular situation is greater or less than the magnitude of the gravitational force exerted on the system, this is an example of conceptual scaffolding.There are many different ways of providing heuristic scaffolding for learners, such as asking them generic questions to help focus on important aspects of the procedure, or questions that encourage students to reflect on their actions and choices.For example, Schoenfeld proposed that instructors should ask the following three questions at any time: What (exactly) are you doing?Why are you doing it?How does it help you? 19Other researchers have recommended the use of sets of cards to help students recall appropriate metacognitive actions: I thought about what I already knew; I made a plan to work it through; I thought about a different way to solve the problem. 20According to Holton and Clarke, heuristic scaffolding facilitates the learning of the approaches and procedures that promote independent knowledge construction and problem solving ͑p.131, Ref. 18͒.
In our labs, we set the goal to help students develop the heuristics of scientific practices.For this, we use heuristic scaffolding aimed at the development of scientific abilities.This scaffolding is generic and reflective by nature and contains three different modes: ͑i͒ guiding prompts and questions on the handouts; ͑ii͒ scientific abilities rubrics; ͑iii͒ TAs' comments on written work, oral feedback and summary advice.
This study focuses on the revisions and gradual refinement of prompts and questions that we made during three years ͑2004-2006͒ based on formative assessment of student work done through scientific abilities rubrics.Such an approach allowed us to conduct an action-research study on student learning of scientific abilities, develop a set of instructional interventions to improve student acquisition of scientific abilities, and simultaneously reflect on the generic features of scaffolding that improve or deter student learning.

III. LABS: GOALS SELECTION, DEVELOPMENT, AND REFINEMENT
The labs described in this study are an integrated component of a lecture-lab-recitation course based on the ISLE curriculum.According to the course structure, students spend 3 h every week in the instructional lab.The goals of the course are to help students construct physics concepts and to develop scientific abilities.To be consistent with these global objectives, the goals of the lab component parallel the course goals, with an even greater emphasis on students' development of scientific abilities. 2,6,7o devise individual laboratory sessions we took into account the material that students learned during a particular week in whole-class meetings ͑the "lecture" component was referred to as whole-class meetings, as there was no traditional "lecture" in the course͒ and collaborative problemsolving recitations.We identified specific scientific abilities that fit best with the content.Then we determined explicitly and in detail, the learning goals ͑local goals͒ for each labo-ratory session ͑these are both content and scientific abilities goals͒.After that, we selected the experimental tasks that were best suited to help students achieve the goals and wrote lab handouts.The handouts have clear heuristic glass-box scaffolding: prompts and questions that focus students' attention on different aspects of scientific inquiry.We also chose the relevant rubrics that helped students plan, monitor, and evaluate their work.As the semester progressed this last component faded and finally disappeared.When students performed the lab, they wrote lab reports in which they described their experiments, answered the handout questions, and self-assessed their work based on selected rubrics.They handed in their reports at the end of the 3 h lab session.
After students handed in their lab reports, we photocopied students' lab reports.At the end of the semester, we scored them according to the same rubrics that students used when working in the labs.We analyzed the scores and determined which were the abilities that students were not developing optimally.We then revised the handouts to provide better scaffolding where needed.We repeated the process every year, giving new students the revised handouts, scoring their reports to assess our work, and modifying the handouts consequently.

IV. DESCRIPTION OF THE STUDY
The study was conducted during three consecutive years ͑2004-2006͒ of the implementation of an introductory algebra-based physics course that followed the ISLE curriculum.The course spanned two semesters, and comprised 185 students who were science majors ͑examples of the majors are: biology, chemistry, environmental science, and exercise science͒.There were two 55 min lectures, one 80 min recitation, and a 3 h lab per week.Although we did not pretest students we can assume that the student population in the course was relatively stable over those years-the course prerequisites as well as advising strategies did not change.Rutgers' incoming student population did not change over those years either, based on the data collected by the Office of Institutional Research and Academic Planning ͑incoming freshman SAT scores changed by less than 1% over the last 10 years͒.
The researchers ͑the authors of this paper͒ were associated with the development and the teaching of the course at different stages of the study ͑for example: MR-V and AK were lab instructors in 2005 and 2006, EE trained lab instructors during 2005 and 2006, SM was course instructor during 2004͒.All authors were involved with developing and revising labs at various stages of the study.

A. Data sources
For this study we collected the following data: observations of student behavior in the labs, analysis of lab reports, and scores of lab reports based on selected rubrics.Instructors recorded observations of student behavior in the labs during the school year by making notes after each lab.As described in the previous section, the results of students' lab report scores were important for the redesign of the lab ma-terials.We present the rubric scores of students' lab reports for each of the three years, as the part of this study.
The reliability of the rubric scores of the lab reports was established in the following way: two trained scorers chose four to seven lab reports for each lab for initial scoring.These reports were rescored until the scorers achieved 100% agreement.Then, they proceeded to score about 20% of lab reports in the sample.After the achieved agreement was above 90%, the scorers split the rest of the lab reports and scored them separately.
In this study the rubrics for the following scientific abilities were chosen: ability to evaluate uncertainties, ability to identify assumptions, ability to evaluate effect of assumptions, ability to evaluate result by a second method.See Appendix A for a description of the above rubrics and 2 for a description of all scientific abilities rubrics.

B. Lab development, data collection, and lab revisions
In the labs, students working in groups of three or four designed their own experiments guided by heuristic scaffolding questions.In the first version of the lab handouts this scaffolding contained: ͑i͒ goals of the lab, ͑ii͒ guiding questions that focused students' attention on what to think about when designing the experiment, and ͑iii͒ reflection questions that asked students to connect their lab experiences to everyday experiences.
In addition, students were advised to use the rubrics for self-assessment.The rubrics were a form of expert heuristic scaffolding.At the same time, they encouraged reciprocal scaffolding between students working collaboratively on an experiment.The nature of experiments was such that it required a significant cooperation of group members, as none of the students individually could solve the problems.This naturally yielded to reciprocal scaffolding.However, this reciprocal scaffolding was usually conceptual as students were mostly focused on how to solve the experimental problem from the content point of view.
The first full implementation of the design labs was in the fall of 2004.We observed students in the labs and made copies of their lab reports.After the semester was over, we compiled the observations and scored the reports for the labs at the beginning and at the end of the semester.We found that from lab 4 to lab 10 students improved significantly on their ability to design an experiment, to devise a mathematical procedure and to communicate the details of the procedure, 2 however, they did not demonstrate similar improvements in their ability to evaluate experimental uncertainties and assumptions ͑Fig. 1, results for 2004͒.In addition, a careful examination of each lab showed specific difficulties that students could not overcome.At the end of the spring semester we conducted a survey regarding students' expectations and goals of the labs.The analysis of students' responses showed that they understood that the labs were helping them develop experimentation abilities but did not see the connection between the labs and their future work. 21ased on scoring of the reports, observations, and the survey, we revised the labs.The revisions went along several directions: we provided more heuristic scaffolding in addition to the questions in the lab handouts, focusing student attention on experimental uncertainties and validation of assumptions in every lab; we enhanced the reflection questions connecting them to students future jobs; we revised the experiments, and made the handouts more student friendly.Specifically, we added special documents describing in detail how to evaluate experimental uncertainties.We also specified a set of rubrics for self-assessment for each lab, we added a question "why did we do this lab" asking students to connect the abilities that they used in the lab to their future work and we added interesting short stories in each lab handout that showed students the importance of scientific abilities in the everyday world.These stories were very short, related to current physics and biology applications and had the ma- Bars represent the percentage of the students whose lab reports received high scores: either "adequate" ͑score of 3͒ or "needs improvement" ͑score of 2͒.We do not have data on the rubric "ability to evaluate the result by second method" in lab 4 in year 2004.
jor goals of increasing student motivation.In addition we made changes in each lab that were specific to the content of that particular lab.For example, in several labs we decreased the number of experiments that students had to design to give them more time to reflect and revise their experimental work and writing.
The following year ͑2005͒ we repeated the procedure.We found an increase in students' ability to evaluate the uncertainties ͑Fig. 1, data for 2005͒ but did not find much improvement in the ability to evaluate the effects of assumptions.Thus for 2006 we added more exercises to help students learn how to evaluate the uncertainty and how to evaluate the assumptions.We also made another change in the lab handouts-we removed the interesting stories as we found them distracting.The last change that was implemented was the lab homework.We realized that students faced a multitude of decisions in the lab when they had to design their own experiments.They could not allocate the time and resources needed to ponder about important aspects of scientific investigations such as the differences in types of scientific experiments. 5In addition, homework would give students practice in their incipient scientific abilities such us evaluating uncertainty or determining the effects that their assumptions might have on the final results.The lab homework was divided in two categories: ͑i͒ practice problems that gave students an opportunity to practice the scientific abilities that we found to be the most difficult to develop, such as estimating the uncertainty in experimental results, and ͑ii͒ analysis of historical research ͑for instance the discovery of prophylactics͒ or scientific investigations of poignant interest ͑for research on how the simian immunodeficiency virus could have crossed the species barrier and infected humans͒.For the second type of exercises students had to read passages describing the investigations.These passages averaged two pages and were adapted from various sources.After reading the passage, students had to answer several questions about the type of experiments described, the sequence of experiments that constituted the entire investigation, and the reasoning that guided the inquiry.These exercises focused students' attention on the dynamics of scientific investigations ͑how research is motivated and how it progresses and evolves͒.By reflecting on the passages students were supposed to develop read-out strategies 22 for the elements of scientific research.If students were able to iden-tify these elements and to recognize the different aspects of the scientific modus operandi, the students would acquire a more advanced and nuanced understanding of what a scientific investigation entails.These deeper insights could affect their own scientific inquiry positively, because it became more deliberate and better rooted in the practices of the scientific community.It is important to note how these homework assignments were different from the interesting stories that we removed from the lab handouts.While the primary goal of "interesting stories" was to increase student motivation, the historical passages had the purpose of showing students how a specific piece of scientific knowledge was constructed over the years.They were much longer than the interesting stories, and ended with questions aimed at the understanding of the process involved in the discovery.
At the end of the 2006 we had the following elements in the lab handouts: ͑i͒ learning goals of the lab that specified what abilities will be the focus of that particular lab; ͑ii͒ rubrics that students need to focus on for selfassessment; ͑iii͒ actual lab tasks-questions that students had to answer during the lab that helped them build a particular ability; ͑iv͒ special lab task-related exercises that focused on the elements of the scientific abilities; ͑v͒ lab reflection questions that were included right in the experimental design part of the lab and at the end of the lab; ͑vi͒ special exercises usually done as homework that allowed students to read and analyze a summary of a historical scientific development ͑unrelated to the lab task͒ that illustrated the actual application of a particular ability.Students had to answer questions that helped then reflect about the role of the abilities under analysis.
In addition, the lab instructors provided extensive written and oral feedback to the students.The students also had an opportunity to revise their lab reports after they were graded.After these additions, we repeated the research procedure.Table I shows the steps we took in different years, aimed at the improvement of various abilities.

C. Specific heat lab
Here we describe in detail the process of improvement of the materials for one lab where students had to conduct a calorimetry experiment ͑Lab 10 of the semester͒.We present the original text of the task, the results of scoring of the lab using the scientific abilities rubrics, and the subsequent revisions of the materials.Appendix B contains the three versions of this lab.We have chosen this particular laboratory to show the revision process in detail because of the following reasons: ͑i͒ To complete this lab, students need to apply many different scientific abilities.
͑ii͒ The outcomes of this lab are extremely sensitive to the procedure of the experiment.In all calorimetry experiments where thermal energy escapes the system resulting in a considerable temperature drift which leads to significant systematic error.Therefore, the ways in which the assumptions, implicit in the procedure, may affect the results are especially important to consider.To complete the task successfully students have to be aware of assumptions they make, whether the assumptions are valid, and how the assumptions may affect the result.
͑iii͒ The lab is at the end of the semester; thus we can observe and analyze a cumulative effect of the lab curriculum.
Using information from lab reports written during Fall 2004, and observations made by the lab instructors, we were able to identify the aspects of the lab that needed improvement.In Table II we show the intended goals that students failed to achieve and the changes that we introduced in the lab handout to help students better reach those goals.Students' main problems were with the ability to calculate the uncertainty in their results, and with the ability to determine the way in which the assumptions, implicit in the mathematical procedure, could have affected the results. 23n 2005, observations of student behavior in the labs showed that this time students had time to work on the two proposed experiments for specific heat.However, trouble appeared when they compared the results for the specific heat of an unspecified metal obtained from these two experiments and found that the two values were different.As they did not expect to make improvements and repeat the procedure, students were frustrated and disappointed.Another problem was that students were not able to measure the equilibrium temperature with the alcohol thermometers that we made available to them.For example, one group of students wrote: "Description of the experiment: We will then quickly put object in the water.We will observe the equilibrium temperature and record it.What went wrong was that equilibrium temperature was read incorrectly."TABLE II.Analysis of students' difficulties in 2004 and the steps we took in 2005 in response.The right column describes the change as related to a specific ability ͑labeled SA͒ with a corresponding rubric or a general issue that is not assessed by rubrics.

Students' difficulties Our steps Issue
Students did not have time to complete a detailed investigation for the second lab task ͑specific heat͒.
We deleted the first lab task.We suggested how to reduce time students spent on writing.

General ͑time management͒
More then 30% of students did not design two independent experiments to determine the specific heat.
We emphasized the importance of the designing two experiments.. SA: Evaluating result by second method.Students only said that the assumptions will affect the result, but they did not specify how: "If any assumption is false it will change the final outcome of the experiment and our value will be wrong." We encouraged students to describe the effects of assumptions in detail and provided them with an example.SA: Identify, evaluate and validate assumptions ͑increased scaffolding͒.
Almost 70% of students did not evaluate uncertainties or did it inadequately.
We helped students with the routine steps of evaluating uncertainties.

SA: Evaluate uncertainty
Students often described how uncertainties could be minimized at a superficial level: "Uncertainty lies in the temperature reading and the mass reading.Minimize by using more exact equipment." We encouraged students to minimize uncertainties in the experiment.

SA: Minimize uncertainties
Only 25% of students were able to evaluate uncertainties adequately and propagate them to the final value.
We emphasized the importance of including uncertainty in the final value.

SA: Evaluate uncertainties
After obtaining the result for specific heat through the second independent method, most students understood that their main assumption that the system's energy was not lost during the experiment was wrong.In most cases students arrived at the correct conclusions on what should be done to improve the results: "We conclude that this relationship is applicable if you do quick transfer of one substance to another.""Based on the results of two experiments they are out of range because our assumptions were wrong.There is a lot heat lost in the transfer of the object…" However they usually did not repeat their experiment using an improved design.
We encouraged students to make a real improvement in the experimental design and repeat the experiment to get a satisfactory result.

SA: Evaluating result by second method
The scoring at the end of the 2005 Fall semester showed an improvement on all of the abilities, still many students had difficulties evaluating uncertainties and considering assumptions.Many students readily repeated the assumptions given as the example in the handout without developing any further ideas.This made us speculate that giving learners this type of support prevents them from advancing toward the goals or, at least, it slows down their progress.It was also apparent that many students did not know how to validate assumptions.Remarks as the following were common: "Our assumptions are valid because otherwise our value for c would be inaccurate."On the other hand, the reports demonstrated a substantial improvement with respect the previous year as some students were able to determine how assumptions might affect the results "If the assumptions were incorrect heat would escape in experiment 1 and heat would enter in experiment 2. This would make our specific heat appear smaller in experiment 1 and would appear larger in experiment 2…" Some students revised their experiments, and thereby improved them: "The two results do not match.This is most likely due to a failure to have the object in one of its baths for the appropriate time… New results c = 589.6J / °C kg...We did not have time to repeat the second experiment.However, the first experiment was improved, and moving the object faster and measuring faster allowed for better results." To respond to students' difficulties and frustration we adjusted the handout again seeking to address students' persistent difficulties.In Table III we show the students' difficulties that we encountered during the 2005 lab implementation as well as after analyzing students' reports and the corresponding changes that we made to tackle them in 2006.
In summary, by analyzing lab reports ͑scoring them with the rubrics͒ and observing students' behavior in the labs we assessed students' learning, in particular, their acquisition of scientific abilities.We identified the areas in need of improvement and we modified the lab handout to address those weaknesses by providing better scaffolding.We repeated the same process twice and we evaluated the adequacy of the changes by comparing students' performance ͑as shown in their reports͒ in different years.

Students' difficulties
Our steps Issue 38% of students did not evaluate uncertainties or they did it but incorrectly.
From the beginning of the semester we added homework exercises to improve students' ability to take into account uncertainties and assumptions.
SA: Evaluate uncertainty.Evaluate and validate assumptions.
Students did not expect to repeat experiments and felt frustrated and disappointed.
We warned students beforehand about the possibility of repeating the experiment.

General ͑students' expecta-tions͒
Students were not able to measure the equilibrium temperature with alcohol thermometers.
We substituted digital thermometers for alcohol thermometers thus allowing students to observe temperature change in time.

General ͑equipment issue͒
Many students readily took the example assumption as something that they had to write.
We deleted the example.SA: Identify and evaluate assumptions.Many students did not know how to validate assumptions: "Our assumptions are valid because otherwise our value for c would be inaccurate." We gave more detailed instructions on how to determine whether an assumption is valid.

V. RESULTS
Prior to this project, our group had identified 48 different scientific abilities that apply to various aspects of scientific investigations, and had developed corresponding rubrics that describe four distinctive levels of competence for each of the abilities. 2However for the sake of clarity in our exposition and due to space considerations, we have chosen to report on only the four scientific abilities that have been found most difficult to acquire: specifically, the ability to estimate the uncertainty in the result, the ability to identify the assumptions implicit in the mathematical procedure, the ability to evaluate how the assumptions may affect the result, and the ability to evaluate the result by means of an independent method.
Figure 1 presents the percentage of students who received high scores on different scientific abilities in labs 4 and 10 as assessed by the rubrics over the course of three years.We can find three patterns on the graphs: ͑1͒ students' performance on scientific abilities improved during the course of a single semester, every year; ͑2͒ students competence at the end of a semester improved steadily during these three years, and ͑3͒ for some of the abilities the changes made in the first three labs resulted in the improved performance on lab 4.
We used the chi-square test to evaluate the difference in students' performance between consecutive years both at the beginning ͑lab 4͒ and at the end ͑lab 10͒ of the fall semester.For lab 4 the difference in student achievement was nonexistent or very small between successive years for most of abilities except the ability to evaluate uncertainty ͑see Table V͒.For this last ability there is a high significant difference between years 2004 and 2005.For the ability to evaluate the result by means of an independent method we do not have data available for 2004.

Specific heat lab
In this section we report with more detail on the level of competence-in terms of scientific abilities-demonstrated in students' reports for the calorimetry lab.The task for this lab was to find the specific heat of an unknown material.The rubrics scores of students' lab reports for the four abilities that are most difficult to attain improved significantly year after year.Figure 2 clearly reflects this considerable improvement.The darker top portion of the bars represents the percentage of students who received perfect scores.This percentage increased through time for all four scientific abilities.For a detailed statistical analysis of the results see Table VI which shows that this increase is statistically significant.This table displays the 2 values for the change in scientific abilities scores between consecutive years.
Students demonstrated in their reports a higher competence level on each of the abilities year after year.This competence was not built up simultaneously for all the abilities and the changes followed a different pattern for each of the abilities.We presume from the data that the improvement demonstrated in each of the abilities at certain points in time reflects the focus of our efforts ͑Table VI͒.
In the following paragraphs we present an account of how the modifications that we made each year on the lab hand-  TABLE V. Chi-square for the differences in scores in lab 4 between consecutive years ͑df= 3͒.Note: a single asterisk represents significant difference and triple asterisks represent highly significant differences.outs to address the inadequate acquisition of each of the chosen four abilities may have impacted students' performance.
Ability to evaluate uncertainties.Table III shows the significant changes that we made for 2005 to emphasize the necessity to evaluate experimental uncertainties.As a result, in the year 2005 the number of students who did not attempt to evaluate uncertainties was virtually zero.In 2005 students showed a significant improvement in mastering the ability to evaluate uncertainties comparing to year 2004.Over four times as many students received perfect rubric scores ͑black bar in Fig. 2͒ in this ability in 2005 ͑35%͒ compared to 2004 ͑8%͒.During 2006, we implemented in labs 1, 3, 4, and 8 the set of homework exercises specially designed to facilitate the development of this ability.The homework showed students how to evaluate uncertainties and gave them the opportunity to practice before the lab where many other things need to be attended to.As a result, in 2006 students' performance on this ability was even better than in 2005.If we combine the students who received scores of "needs improvement" ͑i.e.most uncertainties are evaluated, see Appendix A͒ and "adequate" ͑all uncertainties are evaluated correctly͒, then over 90% of students were able to master this ability in 2006 ͑black and dark grey bar in Fig. 2͒.
Ability to identify assumptions.In 2005 we attempted to address students' difficulty in identifying assumptions.The modified handouts used in 2005 had explicit scaffolding that resulted in much higher scores for the ability to identify assumptions than the scores in 2004.However, our observations suggested that these scores might not represent real improvement because instead of scaffolding students' own work we were giving the answers in the handouts.In 2006, we paid special attention to this ability during the course, made new changes in the lab handouts, and removed explicit answers.As a result, in 2006, when having to figure out by themselves all the implicit assumptions, students received scores as high as in 2005.Over 90% of students in 2005 and all students in 2006 were able to identify most of the assumptions ͑score: "needs improvement" and "adequate" in Fig. 2͒.
Ability to evaluate the effects of assumptions and validate them.We found that the ability to determine how assumptions, implicit in the procedure, might affect the results was an ability difficult to master.In Fig. 2, note the combined percentage of students who received rubric scores corresponding to "needs improvement" ͑dark grey͒ and "ad-equate" ͑black͒.This percentage refers to proportion of students who were at least able to evaluate the effects of assumptions ͑score: needs improvement, see Appendix A͒ and some who were also able to validate their assumptions ͑score: adequate͒.We shall refer to these combined scores as high scores.Only 15% of students received high scores in 2004.Our attempt to scaffold students in this ability was unsuccessful in the year 2004 or in 2005.Students' competence on this ability was slightly better in 2005 ͑25% received high scores͒ than in 2004, but the difference was insignificant.In 2006, we made several important changes in several of the handouts and developed targeted homework exercises to be completed in the fourth week of the lab.A large improvement was seen in 2006: 75% of students received high scores on this ability.
These results appear to indicate that by providing students with the "wrong" type of support, we may hinder their learning.In the year 2005 when we tried to help students by telling them specifically what some of the assumptions implicit in the mathematical procedure were; their scores improved slightly, however they were just repeating the answers given in the handout without any further elaboration.Interestingly, at the same time that year they had a great difficulty explaining the implications of these assumptions.We believe that by helping students too much we removed the agency from them.In this sense too much scaffolding or the wrong type has detrimental effects.
The ability to evaluate the result by second method.The revisions in 2005 were mostly focused on this ability ͑see Table III͒.This led to a significant decrease in the number of students who evaluated the result inadequately by second method.There is a significant difference in the general performance on this ability between 2004 and 2005.In 2006 we did not make any additional changes with relation to this ability.However, as this ability is closely connected to the ability to evaluate uncertainties and the effects of assumptions, other improvements may have affected this ability as well.As a result, students scored higher on this ability, although we did not take any specific actions to improve it.

VI. DISCUSSION AND IMPLICATIONS
In this paper we have shown that during the term of one semester, undergraduate students can develop competence in their exercise of scientific abilities ͑these are processes, pro-TABLE VI.Chi-square for the differences in scores in the lab on calorimetry ͑df= 3͒.

Second year
Ability to evaluate result with second method cedures, and strategies that scientists apply when developing new understandings or solving problems͒.We believe that learning of these complex abilities was so fast and efficient because of a thoughtfully designed learning environment.If we assume that the populations of students that took the course were alike, 24 with similar aptitudes and comparable previous knowledge, the result implies that the modifications we made to the lab handouts, together with the homework targeted to the development of scientific abilities increased the course effectiveness significantly.
The research we have reported in this paper focused not only on the instructional practice but also on product design, as we sought both to establish a model for developing innovations and to produce an apt set of instructional strategies and materials.Therefore we intended to achieve two types of goals: first we wanted to devise a viable and effectual methodology to improve laboratory instruction and second, but not less important, we wanted to generate high quality materials and determine their relevant features.We posed six research questions at the beginning of the paper.Below we will discuss how we answered each of them.
Question 1: How does the "action-research" approach to the revision of the curriculum material work in practice?
According to our goals, this study can be classified as Design Research because it attempts to give responses to fundamental questions but it is inspired by and rooted in practice.Three characteristics of Design Research are applicable to our research: ͑i͒ it addresses complex problems in real contexts, ͑ii͒ it connects design principles with the affordances of the environment to generate solutions, and ͑iii͒ it engages in reflective inquiry to refine instruction and to define new design principles. 25,26ccording to the revision procedure that we followed, our work can be framed in the action-research tradition because it can be described as a fine tuning of instruction to the learners' needs though repeated cycles of planning, implementation, and assessment. 27How does the "action-research" approach to the revision of the curriculum material work in practice?For the design and refinement of ISLE lab curriculum and instructional materials, we followed a sequence of steps very alike to Dick and Carey's model of systematic design of instruction. 8First of all, we thoughtfully determined the instructional goals.The goals directed all the subsequent design decisions: the selection of the most appropriate experiment tasks and their progression, the choice of the laboratory equipment, the writing of handouts, and the selection of additional exercises.Our set of nuanced and detailed performance goals can be summarized in a single statement: students should demonstrate approaches and behaviors similar to those of scientists in the laboratory.
The design and refinement of ISLE laboratories is a complex undertaking and therefore it required multiple cycles of revisions and implementations.We integrated the assessment for learning paradigm into the action-research framework.In AFL, assessment should inform instruction. 11Assessment functions as a diagnostic tool to find out what students know and are able to do, determining the gaps in their knowledge so that instructors can adjust their teaching to learners needs.In summary the two key features of AFL are: ͑i͒ students' progress is continuously evaluated, and ͑ii͒ instruction is modified and remodeled to fit students.
Our approach to curriculum design is not foreign to the PER community but is firmly rooted in the research tradition started by McDermott.She has argued that research must guide the development of the physics curriculum. 4Researchers should first investigate students understanding and develop educational interventions tailored to learners' previous knowledge and then the instructional materials and strategies must be tested to find if they have a positive impact on students' learning and to determine under what circumstances a specific innovation is effective. 28An example of the work by McDermott and the PER group at the University of Washington on the development of a curriculum for introductory electricity illustrates how to improve instruction by engaging in a three-phase cycle of design, test, and modification. 29They concluded that by revising the curriculum, they gained valuable knowledge on how students understand physics ͑p.1012͒.The difference between our work and that of McDermott and colleagues is that we applied this approach to the scientific abilities as opposed to the learning of physics content.
Question 2: How might instructors manage the introduction of new curricular materials?Should the changes be made and implemented simultaneously or progressively?
We have found that there is no need to implement all the changes at once, but instructors can revise different aspects of the curriculum separately and introduce changes as they are made.This way makes it easier to determine what effects correspond to which modification.In addition, revising materials and strategies gradually is a better fit to the fine tuning necessary to respond to learners' needs and characteristics.For instance in the fall semester of 2005 our revision efforts concentrated on students' improvement in their treatment of uncertainties and the following year we focused on their understanding of assumptions.
Question 3: What happens to the quality of student work when the students are required to put more effort toward a particular aspect of learning?Some may argue that when instructors direct students' attention to the development of a particular scientific ability and expect students to answer more questions and complete more exercises, learners will neglect other aspects of the scientific investigation that they may have previously mastered.We have noticed that, although the cognitive load of ISLE labs is higher than in traditional labs and students tend to resent it during the first weeks, they are able to succeed and indeed they thrive as the semester progresses.A period of adjustment is necessary and the accomplishments can be explained in the different forms of scaffolding embedded in the labs: rubrics, prompts, questions, instructors' feedback, targeted exercises, and peer support ͑as students always work in groups͒.Our data clearly show that an increase in scores on a particular scientific ability is not accompanied by any reduction in the other abilities ͑see Figs. 1 and 2͒.Our complete list of scientific abilities contains almost 50 different scientific abilities that we use to assess students work and we have never observed that gains in a certain ability was associated with loss in another.In fact, disregarding some normal fluctuations, student improvement is steady and firm as the scores for all the abilities increase ͑although at different rates͒ with time. 30ur results of students' improvement in scientific abilities might be affected by some other factors.Over three years, different course professors and different lab instructors have taught this course.One might ask the question whether the variability of instructors affected student outcomes.Variables are notoriously hard to isolate in educational research.Possible issues arising from the variability of instructors in our three-year study are in part mitigated by the facts that all course professors and almost all lab instructors have either been part of the PER group, or graduate students in the science education program, where they study many seminal ideas of PER.Lab instructor training was conducted by one of the researchers so that lab instructors were aware of the goals, techniques, and pedagogical content associated with every lab.In addition, great care was taken to integrate the different components of the course to make all the curriculum materials and activities consistent with the ISLE approach, which is inherently a student-centered activity.Another limitation is that we do not have firm evidence that student populations were alike to begin with in terms of their scientific abilities.However, the steadily increasing trend of improvement of scientific abilities over multiple years is a strong indication that it is the targeted instruction, materials, and student experience in labs that has affected the positive outcome.A third limitation is that some instructors initially find the scientific abilities daunting to work with.Our response is that one can choose which abilities the instructors want to focus on at different points in their course.Mastery of scientific abilities can be gradually built.
Question 4: What can be done to improve the quality of students' experimental investigations?
We have developed a curriculum to improve student learning in introductory physics labs in a large enrolment course.The ISLE labs are our answer to the question on what can be done to improve the quality of introductory instructional labs.In these reformed laboratories, students learn and are able to exhibit the same processes employed by scientists in their research, over the length of a single semester.We have found that there is no need to oversimplify laboratory assignments to ensure that students are able to complete the tasks.Although it is true that engaging in scientific inquiry requires a very complex set of abilities and it presents considerable demands for students, they can succeed in the challenge and demonstrate these abilities when provided with the necessary support and learning opportunities.On a separate note we want to clarify that students' learning of physics concepts does not suffer in ISLE where they do not receive the right answers or procedures.We are able to make this affirmation because in 2006 we conducted an experiment splitting the students in two groups, one attended ISLE design labs and the other one attended nondesign labs.The rest of the treatment was identical for the two groups.The students obtained similar grades in the common exams but the nondesign students did not acquire scientific abilities as well as the design students. 31uestion 5: What are some of the steps that instructors may take to better support students' lab work?
From the results of the study we draw the following implications for the features of the instructional materials.Lab handouts have to include prompts and questions to direct students' attention to important aspects of the experiments investigation that otherwise students will ignore.Scaffolding should be mainly heuristic, that is, scaffolding directed to the development of approaches and procedures instead of concepts. 18For those scientific abilities that present the biggest challenges, additional exercises tailored to address the difficulties may be given.Instructors might present the students with accounts of actual research done by scientists stressing the reasoning processes that underline the investigations.Finally, rubrics are an indispensable tool to facilitate the development of scientific abilities as they break down complex procedures into elemental steps and provide concrete descriptors of different levels of competence for each of them.In this paper we have shown that the approach is remarkably successful.Question 6: When does instructor help become inefficient or counterproductive?
After establishing that scaffolding is indispensable for completing demanding tasks such as in ISLE design labs, we may ask if there is any type of scaffolding that is inefficient or counterproductive for supporting students in their experimental work and in their learning.We have found that overscaffolding or overfacilitating student work had negative consequences.It is crucial that instructors do not remove the agency from students. 32,33Students need to create their own solutions to be able to amend them later because, as we know, knowledge can not be transmitted but must be recreated in the minds of learners. 34Instructors are often tempted to "help students too much."As a result they deactivate instruction assigning simple one-step exercises in place of meaningful challenges.We think that scaffolding should be heuristic and as general as possible designed to support student work but not to do the work for them.Results that point in this same direction were obtained by Davis when she studied the effects prompts for reflection with middle school student population. 35The researcher compared generic prompts ͑unspecific calls to think as they worked on their science project͒ and directed prompts ͑that offer students hints about in which direction to think͒.A priori directed prompts may seem to be a better option because generic prompts appear insufficient.However, Davis found that, when working on complex science projects, the students who were given generic prompts were able to develop more articulate insights than the students who received directed prompts.
In the future most of the students will not use most of the concepts that they learned in introductory physics courses; many of them will not conduct scientific research ͑this does not mean we think that scientific abilities are irrelevant for our every day lives͒.Therefore the most important aim of our introductory courses should be the creation of an essential cognitive and behavioral residual: the ability to learn by their own means.In our program we seek that students become progressively acculturated in the scientific way of producing knowledge.For this reason, scaffolding is such a focal point of our work.
Most experimental uncertainties are evaluated correctly, though a few contain minor errors, inconsistencies, or omissions.
All experimental uncertainties are correctly evaluated and the final result is written with the percent uncertainty.
Is able to identify the assumptions made in using the mathematical procedure.
No attempt is made to identify any assumptions.
An attempt is made to identify assumptions, but most irrelevant, described vaguely, or incorrect.
Most relevant assumptions are identified.
All relevant assumptions are identified.
Is able to determine specifically the way in which assumptions might affect the results.
No attempt is made to determine the effects of assumptions.
An attempt is made to determine the effects of some assumptions, but most are missing, described vaguely, or incorrect.
The effects of relevant assumptions are determined correctly.
The effects of relevant assumptions are correctly determined and the assumptions are validated.

Is able to evaluate the results by means of an independent method
No attempt is made to evaluate the consistency of the result using an independent method.
A second independent method is used to evaluate the results.However there is little or no discussionabout the differences in the results due to the two methods.
A second independent method is used to evaluate the results.The results of the two methods are compared using experimental uncertainties.But there is little or no discussion of the possible reasons for the differences when the results are different.
A second independent method is used to evaluate the results and the evaluation is done with the experimental uncertainties.The discrepancy between the results of the two methods, and possible reasons are discussed.

APPENDIX B. LAB HANDOUT VERSIONS FOR THE CALORIMETRY LAB FOR THREE YEARS. CHANGES ARE SHOWN IN BOLD.
The following shows the lab handout versions for the calorimetry lab for three years:

FIG. 1 .
FIG.1.Students' performance in labs 4 and 10.Bars represent the percentage of the students whose lab reports received high scores: either "adequate" ͑score of 3͒ or "needs improvement" ͑score of 2͒.We do not have data on the rubric "ability to evaluate the result by second method" in lab 4 in year 2004.

FIG. 2 .
FIG. 2. Student scientific abilities scores for the calorimetry lab during 2004-2006.Bars represent the percentage of the students whose lab reports received scores shown at the bottom of the figure.Notice the decrease in the height of the white bars and the increase in the black ones.

TABLE I .
Steps aimed at improving different abilities in the course.
Table IV presents the number and kind of changes that we introduced during the years 2005 and 2006 on the handout for the calorimetry lab.

TABLE III .
Analysis of students' difficulties in 2005 and the steps we took in 2006 in response.The right column describes the change as related to a specific ability ͑labeled SA͒ with a corresponding rubric or a general issue that is not assessed by rubrics.

TABLE IV .
Number of steps aimed on improving different abilities in lab 10.

why it is impor- tant to design two experiments to determine a quantity Play with the equipment to find how you can use it to achieve the goal of the experiment.
Come up with as many designs as possible… choose the best two designs For each method, write the following in your lab-report: (

Note: Try to reduce the amount of writing you have to do by referring to earlier points. For example: If some of your assumptions are the same for both methods, just write "see method 1 assumptions.") ͑a͒
Write a verbal description and a draw labeled sketch of the design you chose… ͑b͒ List all assumptions you have made in your design.

Estimate the relative uncertainty of your mea- surement. Assumptions After you recorded the reading of the scale, you no- ticed that the table on which the scale was sitting was tilted a little bit. You measure the angle of the tilt and find it to be about 100. Can you assume that the table is not
You have access to the following equipment: water, Styrofoam container, weighing balance, thermometer and timer.For each method, write the following in your lab report: ͑a͒ First, come up with as many designs as possible to determine the specific heat.Write a brief outline of each procedure you come up with.Then choose the best design.Indicate the criteria that you used todecide which design was "best."͑b͒ Include a verbal description and a labeled sketch of the design you chose.͑c͒ Construct the mathematical procedure you will use.͑d͒ What physical quantities will you measure?͑e͒ List the assumptions you make in your procedure.How could they affect the result?͑f͒ What are the sources of experimental uncertainty?How would you minimize them?͑g͒ Perform the experiment and record your measurements.Make a table for your measurements, if necessary.͑h͒ Calculate the specific heat, based on your procedure and measurements.͑i͒ After you have done both experiments, compare the two outcomes.Discuss if they are close to each other within your experimental uncertainty.That is, if they results are different, can they be explained by the assumptions you made in your procedure?͑j͒ List any shortcomings in the experiment design and how you would address them.Decide why this activity was included in the lab.Think of real life situations ͓and briefly describe them͔ in which you need to figure out things similar to this experiment ͑e͒ Calculate the specific heat… Include

experimental uncertainty in each value of specific heat that you determine
. ͑f͒ After you have done both experiments, compare the two outcomes.Discuss if they are close to each other within your experimental uncertainty.If

not, specifically ex- plain what might have gone wrong-perhaps one of your as- sumptions was not valid. If your experimental results are not close to each other within experimental uncertainty, perform the experi- ment again taking steps to im- prove your design. For example, you could take all measurements quickly so that hot objects do not cool off, or you could improve the thermal insulation of your calo- rimeter. Decide which of the assumptions affects your results most.
Explain how the outcome of the experiment depends on this assumption, i.e. if the assumption increases or decreases your result ͑d͒ Design

an additional experi- ment to determine whether the main assumption is valid in your experiment. Quantitatively esti- mate the effect of this assumption on the value of your measurement and compare it with the instru- mental uncertainty…
͑e͒ List sources of experimental un-certainty… estimate the uncertainty in your re-sult… Perform the experiment.Make sure you take steps to minimize experimental uncertainties and the effect of the assumptions… ͑f͒ After you have done both experiments, compare the two outcomes… If your experiments are not close to each other within experimental uncertainty explain what might have gone wrong and perform the experiment again taking steps to improve your design…