Patterns, correlates, and reduction of homework copying

Submissions to an online homework tutor were analyzed to determine whether they were copied. The fraction of copied submissions increased rapidly over the semester, as each weekly deadline approached and for problems later in each assignment. The majority of students, who copied less than 10% of their problems, worked steadily over the three days prior to the deadline, whereas repetitive copiers (cid:1) those who copied (cid:1) 30% of their submitted problems (cid:2) exerted little effort early. Importantly, copying homework problems that require an analytic answer correlates with a 2 (cid:1) (cid:2) (cid:2) decline over the semester in relative score for similar problems on exams but does not signiﬁcantly correlate with the amount of conceptual learning as measured by pretesting and post-testing. An anonymous survey containing questions used in many previous studies of self-reported academic dishonesty showed (cid:3) 1 / 3 less copying than actually was detected. The observed patterns of copying, free response questions on the survey, and interview data suggest that time pressure on students who do not start their homework in a timely fashion is the proximate cause of copying. Several measures of initial ability in math or physics correlated with copying weakly or not at all. Changes in course format and instructional practices that previous self-reported academic dishonesty surveys and/or the observed copying patterns suggested would reduce copying have been accompanied by more than a factor of 4 reduction of copying from (cid:3) 11% of all electronic problems to less than 3%. As expected (cid:1) since repetitive copiers have approximately three times the chance of failing (cid:2) , this was accompanied by a reduction in the overall course failure rate. Survey results indicate that students copy almost twice as much written homework as online homework and show that students nationally admit to more academic dishonesty than MIT students.


I. INTRODUCTION
Two thousand years ago, Imperial China went to great lengths to curb cheating on civil service exams only to find that examinees invented increasingly clever ideas to beat the system ͓1͔. Today, as then, cheating on high stake exams is considered to be "very serious ͓2͔"-much more serious than "unauthorized collaboration" ͑e.g., working together on homework solutions͒, the only form of academic dishonesty that has significantly increased over the last 40 years͓2͔. The burden of this paper is that homework copying, although not regarded as nearly so morally wrong as exam cheating ͓2͔, is a serious educational problem that is associated with reduced learning and consequent course failure. We reach this conclusion by detecting copying of online homework, then showing that it follows distinctive temporal patterns and importantly that it correlates with test performance that declines about two standard deviations over the course of the semester. Finally, we emphasize that it can be reduced by describing changes in course format that were accompanied by a fourfold reduction in copying.
This study approaches homework copying from multiple directions: developing algorithms to detect copying in an online tutorial system, discovering its temporal patterns, and determining its correlation with both academic outcomes and demographic factors. In order to place our results in the context of previous studies of academic dishonesty, we conducted a self-reported cheating survey ͓3͔ that shows that MIT students report less overall cheating than students nationally and that they generally consider cheating to be more morally reprehensible.
We then argue that homework copying, especially of written homework, is likely to be a severe problem nationally and we recommend that instructors should assign high priority to restructuring courses to reduce it. Finally, we show that certain changes in course format correlated with a dramatic reduction of copying at MIT.

II. DETECTING COPYING
We detected copying from the log of student interactions with a web-based socratic tutorial homework system called MasteringPhysics.com that was used in four of the largest introductory calculus-based physics classes studied at MIT: mechanics ͑8.01͒ in Fall 2003, Fall 2004, and Fall 2005 and the follow-up electricity and magnetism ͑8.02͒ course in Spring 2006. ͑Calculus-based introductory physics is required of all MIT undergraduates.͒ Recent research on time to completion of problems in MasteringPhysics ͓2,4͔ enabled us to develop algorithms that give a probability that a particular submitted solution has been copied. We consider a problem ͑typically consisting of two to four related questions͒ to be copied if there is only a short time interval between the student opening it and answering all. A previous study ͓4͔ shows that the rate of students completing a problem plotted as a function of the log of the time since each student opened their problem ͑see Fig.  1͒ generally shows three peaks: one centered around 10 min due to "real time solvers" who typically make at least one mistake and often ask for hints or subtasks that lie on the route to solution, a second centered at close to 1 min due to "quick solvers" who rarely make mistakes, and a peak a day or two later of "delayed solvers." Since 1 min is insufficient time to read the problem and enter several answers typically required ͓4͔, we infer that the quick solver group is copying the answer from somewhere.
We now argue that our quick solver definition of copying is close to a more pedagogically relevant definition, "obtain and submit an answer with essentially no intellectual engagement with the question." This involves considering how our algorithm might indicate "false positives" or fail to indicate "false negatives" relative to this more pedagogical definition.
False positives result if the student independently works out the solution to a problem before opening the problem in his browser. False positives are suppressed by the design of MasteringPhysics. It has no provision for printing out the entire assignment and allows a student to view or print out only individual problems. Moreover, some questions in many multipart problems are blocked out until the previous question has been answered. In principle, students could work printed problems that they obtained from other students without looking at the answers. However, we believe this occurred infrequently, if at all, since no students on either the open-ended survey questions or in the interviews ͑both discussed later͒ said anything like "I got a printout of the questions from a friend and did them on my own." Our quick solver criterion produces some false negatives because our "short time equals copying" algorithm excludes some scenarios in which students obtained the solution without significant intellectual engagement. According to our interviews, students may open several problems, think about only a few, then obtain solutions at a collaborative problemsolving session or by text messaging a friend for the answer. Such problems would not be answered soon enough to register as being copied. An upper bound to false negatives from such behaviors is provided by the fact that errorless submissions are a hallmark of over 90% of the quick responses we judge as copied ͑see Fig. 1͒, but only ϳ1 -2 % of the submissions submitted beyond 20 min after opening the problem contain no errors. The interviews revealed that a few students might copy but deliberately take additional time to avoid detection. Although 2 -3 % of the submissions in the range 3-25 min do not make mistakes, the relatively early temporal distribution of these suggests that they are among the first of the real time solvers. ͑We think online responses from friends would not be so uniformly prompt.͒ It is important to note that to the extent that there were either false negatives or false positives, then the dramatic patterns and correlations of copying with time and academic performance reported here will be an underestimate of the true effect size of pedagogically significant copying.
Our copy algorithm uses several steps to calculate the probability that a problem done in a given amount of time is copied: we fit the completion rate vs log e ͑time͒ curve ͑e.g., Fig. 1 in the range up to 1 h͒ for each problem separately using two Gaussians ͓4͔. Then we found the best fit to the location and width of the quick solver peak of the form a + b ϫ N parts , reasoning that it takes a similar time to open each problem and an additional increment to enter each answer. Then the data were refit but with the quick solver Gaussian constrained to the fit location and width appropriate for the number of answers required for that problem. We used the ratio of the quick solver Gaussian to the sum of it plus the real time solver Gaussian to determine the probability of copying as a function of time to completion for that problem. We fit different a's and b's for the 2006 data because the MasteringPhysics switched to a symbolic rather than text string equation input which took a bit longer to enter equations-an expectation borne out by our fit of time to copy vs number of parts. Our procedure could not be applied to ϳ13% of the problems, including almost all multiple choice problems, because the time to answer did not resolve into two separable peaks. We assumed that these problems were copied at the same rate as those for which this method applied, probably slightly inflating the amount of copying since students are significantly less likely to copy problems that can be answered quickly and/or by guessing. The overall fraction of problems copied as determined with the time-FIG. 1. ͑Color͒ Behavior underlying a typical rate-ofcompletion curve. The total curve is shown by the filled squares; the other three curves show the breakdown of the total curve depending on whether there were hint requests and wrong answer submissions. The use of log e ͑t͒ as the independent variable is discussed in Ref.

͓4͔.
based method described above is summarized in Table I in the penultimate section of this paper.
The just-described criterion is simpler, quicker, and easier to implement than algorithms developed for cheating detection used in previous research by our group ͓2͔. Those algorithms combined quickness with an additional Bayesian algorithm based on several additional factors. The Bayesian algorithm was slightly less sensitive and found 10-20 % fewer students in the repetitive copier group. The major findings, correlations, and conclusions in the analysis here were unchanged from those previously, and figures and tables crediting Ref. ͓2͔ are sometimes used here.

III. TEMPORAL PATTERNS
Unlike previous detections of individual occurrences of academic dishonesty ͑e.g., copying of exams and plagiarism of written work͒, this study follows one type of academic dishonesty continuously over a semester. This allows us to discern temporal, behavioral, and academic patterns that differentiate copiers and noncopiers and that suggest causes of and ways to reduce copying.
We have elected to present the detailed patterns of copying for only the Fall 2003 course because its lecturerecitation format is typical of large introductory course nationwide and because it contained the majority of copying observed in all four courses studied.
In Fall 2003, N = 428 students were offered three lectures ͑around 215 students in each of two lectures given at two different times͒ and two faculty-taught recitations each week. Attendance at these was not required and averaged around 60%. Students also completed two homework assignments per week-one electronic homework assignment in Master-ingPhysics ͑ϳ10% of overall grade͒ and a written homework assignment ͑ϳ7% of grade͒. This class did not include a laboratory component. ͑All other classes from 2005 onward were taught in studio physics format ͓5͔ with ϳ75 per section.͒ The 2003 class was broken into four groups: heavy copiers who copied more than 50% of their electronic homework ͑ϳ10% of all students͒, moderate copiers at 30-50 % ͑ϳ10%͒, light copiers who copied 10-30 % of their electronic homework ͑ϳ29%͒, and the majority who copied less than 10% ͑ϳ51%͒. While this last group contains some students who copied some problems, many did not copy at all.
The most significant temporal pattern is the marked increase in copying over the course of the semester ͑Fig. 2͒.
Copying grows rapidly in the first three weeks, probably reflecting increased academic load as well as the time to form social networks that facilitate copying. A second increase occurs ͑assignment 8͒ after midterm exams. Unlike regular assignments, those marked R were reviews and did not earn credit toward the final grade. ͑Neither these assignments nor the occasional practice problems on regular assignments were included in the copying statistics.͒ The second noteworthy pattern ͑Fig. 3͒ is the fraction of problems completed over the 7 day assignment cycle that ended at 10 p.m. Tuesday evening. The majority group ͑Ͻ10% of their problems copied͒ does their work in a timely fashion; working steadily over three days before due time and completing ϳ 1 2 of their problems two days before they are due. ͑This result was surprising to ϳ95% of ϳ150 faculty who typically guessed 10-20 % when asked to estimate how much homework was completed by the majority group two nights before the deadline.͒ The repetitive copier group ͑Ͼ0.3͒ typically does only ϳ10% of their work two days early, and leave almost 60% of the assignment to the final six FIG. 3. Fraction of problems complete at 11:59 p.m. on given day or at a particular time on the due date ͑due time is 20 h͒. Students with lower copying fraction start their work much earlier than heavy copiers, who have three times more unfinished problems at the due time.
hours, and about 15% until after it is due. Repetitive copiers are more than three times as likely to complete the assignment after the deadline with resulting loss of credit. Even though their exam scores steadily decrease, increasing their risk of failing exams to about 50% by the final, they did successively fewer of the ungraded practice problems made available prior to exams over the term, declining from nearly 50% to less than 40%; the remaining students held steady at around 60% ͑see Ref. ͓2͔͒. Clearly, repetitive copying of online homework is associated with other signs of not exerting timely and sufficient effort. Figure 4 shows that the copy rate increases as the deadline approaches and passes. But even repetitive copiers do not copy heavily on those few problems they complete a day in advance of the deadline.
One suggestion that emerges from these data is that there does not seem to be a moral threshold that, once crossed, leads to much more copying. We see many students who copy only several problems over the entire semester, but this looks more like real copying in response to the usual pressures than false positives of the detection. The view that pressure affects students of good moral character is that the Ͻ10% copiers do copy in response to pressure right before the midterm break, decrease again after the break when pressure is less, but increase the last two weeks.
Students are more likely to copy a problem if it is more difficult, if it is later in the assignment, if they do it closer to the deadline ͑see Fig. 4͒, or if the assignment is later in the term. A multiregression of these factors gives copy fraction = − 0.0736 + 0.0137 ϫ ͗difficulty͘ + 0.0201 ϫ assignment order + 0.0086 where difficulty is determined by the difficulty algorithm in MasteringPhysics and ranges over 0.259-6.806, and the assignments ͑problems͒ are labeled starting from 1 at the beginning of the semester ͑assignment͒. This expression fits individual student copy rates with r = 0.63 and average error of 0.045.
There is possibly a nonlinear rise in copying with the number of the problems on an assignment: increasing and averaging about 11% up to problem 6, then increasing to ϳ19% for problem 10 ͑see Fig. 5͒.

A. Course examinations
The most striking correlate with repeated homework copying is severely declining performance relative to class average over the five primary assessments: the mechanics baseline test ͑MBT͒ pretest given the first day, three 1 h examinations and one 3 h final examination. As shown in Fig. 6, the average scores of all copying groups were within combined standard errors of the mean of each other on the MBT, a conceptual and computational test ͓6͔ of much the same physics as on the first examination. The two groups of repetitive homework copiers ͑those who copied Ͼ30% of their problems͒ scored progressively lower on all but one successive test over the semester ͑Fig. 6͒. On the final exam, heavy copiers ͑Ͼ50% of problems copied͒ scored 1.3 standard deviations below the low-copying group of students. Since they copied about 62% of their homework, we would infer roughly two standard deviation difference on the final exam for students copying all of their homework vs none, a result consistent with the slope of ␤ = −2.42Ϯ 0.23 for the regression fit to the final exam score vs fraction of homework copied in Fig. 7. This confirms the 2 effect size improvement on the final exam found for students who completed their assigned MasteringPhysics problems vs the extrapolation for those who completed none ͓7͔.
The large relative correlation of copying with final exam scores is indicated by an algorithm that predicts the final exam score from other indications of behavior and performance. It was developed to select students after the first exam who appeared at risk of failing the final exam ͓2͔. The   . 4. Fraction of problems completed in the previous interval that are copied. Repetitive copiers typically do a very few easy problems mostly by themselves two days prior to the due date and copy later problems heavily.
FIG. 5. ͑Color͒ Copy fraction vs problem location ͑from Ref. ͓2͔͒ is higher for problems that are later in the assignment, with a correlation coefficient r = 0.93 ͑from linear regression͒. This is due in part to later problems being more difficult. ͑The correlation coefficient between copy fraction and difficulty is r = 0.35.͒ final exam score is predicted from a multiregression of the scores of students on homework copying ͑C͒, first midterm score ͑X1͒, skill of students determined from the online homework tutor ͑S͒, and MBT pretest score ͑D͒. The written homework grade was an insignificant predictor, where all variables are normalized in terms of standard deviations of the class ͑Z scores͒. The fit had r = 0.69Ϯ 0.05. This is evidence that, contrary to the typical belief of American students that innate ability ͑i.e., the MBT pres-core͒ is the principal determiner of exam success, doing all assigned work is a surer route to exam success than innate physics ability.

B. Mechanics baseline test vs copying
Given the strong decline in test scores with increased copying for exam problems requiring analytic responses, we were surprised to find insignificant ͑r = −0.03͒ correlation between the amount of copying and a student's conceptual learning as measured by normalized gain on the MBT ͓8͔. This was one order of magnitude less than the highly significant correlation r = −0.43 between final exam score and copying fraction. The repeated copiers appeared marginally weaker on the MBT pretest ͑slope of −0.48Ϯ 0.27͒ and just significantly weaker on the post-test ͑slope of −0.61Ϯ 0.27͒ with resulting insignificant difference between predependence and postdependence of MBT score on the amount of copying ͑difference of ϳ0.3 combined error bars of ϳ0.4, p ϳ 0.7͒.
Why do repetitive copiers show equal learning performance as students who copy much less on the MBT test ͑improving ϳ1.2 standard deviations͒ whereas they learn around 1.5 standard deviations less on analytic problem solving? The most logical explanation is that most repetitive copiers did the MasteringPhysics problems relevant to the MBT. This test covers only material in the first half of the semester when repetitive copiers copy only ϳ20% of their problems; furthermore, the relevant concept questions are in the first half of each assignment where their copy rate is even less. Students may also learn MBT-type material from other elements of the course ͑e.g., lectures, recitations, textbook, or talking to other students͒ unaffected by homework copying. This is strong evidence that repetitive copiers can learn physics as quickly and as well as their colleagues if they try.
The difference of the correlation of homework copying with learning outcomes on exam problems requiring analytic responses vs the MBT is a very strong and extraordinarily specific result by educational standards. It is comparable to the much larger increases in conceptual learning ͑learning effect of ϳ1͒ in comparison with score increases on traditional exam questions ͓9͔ after peer instruction relative to those obtained with traditional instruction. Our finding also suggests that doing ͑vs copying͒ analytic problems involving angular momentum or gravitation and planetary orbits ͑topics covered later in the term or assignment͒ contributes little to increasing MBT scores even though these problems involve many topics on the MBT.
Finally, we address the effect size associated with copying using several different approaches. The first approach is to measure the learning on the analytic final exam problems by using a multiregression involving the MBT pretest and the overall copy rate. This gives final exam score ͑Z͒ = − 2.32 ϫ copy rate + 0.094 ϫ MBT pre ͑Z score͒, suggesting a slope of learning vs copy rate of −2.3 standard deviations per 100% copying. The second approach is motivated by the hypothesis that copying causes lower scores. In this case, only copying prior to each exam should correlate with score degradation on that exam. But since the exams emphasize recent material, recent copying should be more important. Thus we plot ͑for each exam after the pretest͒ the slope of score vs weighted copy fraction for that exam. For each exam, the most recent period is weighted 1.0 and all previous intervals are weighted 0.2 in total ͑except for the final this weight is increased to 0.3͒ to account for our estimate of the exam's coverage of the material from previous intervals of the semester. This graph, shown in Fig. 8, indicates a roughly constant effect of prior copying, consistent with this hypothesis. Importantly, it shows that although the amount of copying early in the term is small ͑and hence the error bar is larger͒, it correlates with nonlearning at the same rate as does the increased copying later in the semester. ͑Incidentally, since the copying increases roughly linearly during the semester, the hypothesis that lower test scores cause subsequent copying is also supported by the data but with a lower rate ϳ1.2 standard deviation of test score causes 100% copying.͒   6. Exam Z scores ͓͑score-average͒/͑standard deviation͔͒ show marked decrease for the moderate and heavy copiers on exams further into the term in 2003 relative to the MBT pretest given the first day. The scale at right shows letter grades corresponding to the Z score. The curves guide the eyes.

C. Attrition rate
Repetitive copiers have a much higher attrition rate during the two-term introductory sequence ͑see Fig. 9͒, as one might expect from their lower exam scores in mechanics, and their poor foundation, copying, and/or lackadaisical study habits seem to reduce performance in the following electricity and magnetism course ͑8.02͒. Over the twosemester sequence, repetitive homework copiers ͑Ͼ30% copied͒ exhibited a 20% attrition rate compared to 5.9% for all other students, an unfortunate result for students who started the year with math and physics skills essentially equal with their classmates.

V. SELF-REPORTED CHEATING SURVEY AND FOLLOW-UP INTERVIEWS
To compare our observations with the extensive body of work on academic dishonesty, mostly in the form of selfreported surveys ͑summarized in Ref. ͓2͔͒, we administered an academic dishonesty survey to our students in Spring 2006, primarily investigating copying behavior in Fall 2005. Although 2005 used studio physics, instead of the lecturerecitation format used for the 2003 data described above, we feel that we were studying the same type of copying and that students have the same motivations and techniques for copying since the 2005 data showed the same trends ͑strong rise ͑Color͒ Contrast of conceptual and analytic problem scores vs homework copying. ͑a͒ Linear fits to pretest and post-test scores on MBT. ͑b͒ Score on final problems requiring analytic responses has slope of −2.42Ϯ 0.23 standard deviation per 100% copied ͑equivalent letter grades on right͒. The analytic test scores were shifted so that 100% copying would give no learning. ͑c͒ Learning effect ͑postscoreprescore shown with open circles͒ on MBT is 1.24 standard deviation ͑p Ͻ 0.0001͒, independent of homework copying fraction, contrasted with learning effect on analytic problems ͑solid points͒. Note that it lacks the marginally significant structure seen at Ͻ0.1 copy fraction in both prescore and postscore. ͑Similar structure was not present in the 2005 data in Ref. ͓2͔.͒ The fits are to all individual students; the error bars group the students into bins of maximum width of ϳ0.13 and maximum number of students of 58. over the semester, procrastination by the repetitive copiers, same slope of analytic final exam scores vs copy fraction of −2.06Ϯ 0.34, etc.͒ albeit at ϳ1 / 2 of the 2003 rate.
We had several objectives in designing this survey: ͑1͒ to ask both multiple choice and open-ended questions to elicit details on the mechanisms and motivations for homework copying; ͑2͒ to include some questions identical ͑in both wording and format͒ to McCabe's widely administered integrity survey ͓10͔ in order to compare MIT students with national norms; ͑3͒ to include more quantitative questions to facilitate quantitative comparison of self-reported copy rates with measured rates ͑and to calibrate the less specific questions on the integrity survey ͓10͔͒; and ͑4͒ to test whether the situational and demographic factors found to correlate with more self-reported copying in the earlier studies ͓2͔ were correlated with measured copying.
The details of this will be published in Ref. ͓3͔. Briefly, we found that students commit about 50% more copying than they self-reported on the self-reported survey ͓3͔. We showed that actual copying ͑from both 2003 and 2005 data͒ correlated with demographic factors: being male ͓11͔ and being a business major ͓12͔ as found in previous selfreported dishonesty surveys. ͑Since our freshmen had not declared a major when they took the survey, we showed that copying is a leading indicator of becoming a business major.͒ Our survey's focus on the underlying motivations of these students confirmed earlier indications by Sandoe and Milliron ͓13͔ and Newstead ͓14͔ that students motivated by learning rather than obtaining grades or credit report that they cheat less. Orientation toward understanding, either within their major or within this course, correlated with reduced self-reported written and electronic homework coping by an average of 40%, and these factors were multiplicative. We found no effect of being older, upperclass, or of different ethnicity.
Our survey shows that copying written homework is more prevalent than copying electronic homework. MIT students self-report about 75% more copying on written homework ͑ϳ6.5% of all problems and 1.55 times in the last year͒ than on electronic homework ͑ϳ3.8% and 0.84 times in the last year͒. This may be because the most common self-reported mechanisms for copying written homework on our survey, "copying a borrowed assignment" ͑58% of survey responses͒ and "finding the solution online" ͑34%, often using the MIT Open Courseware site͒, are not available avenues for the electronic homework. 1 Furthermore, online students who are stuck on a problem or unsure whether their solution is correct benefit from the feedback and hints available, reducing the need to "borrow" others' assignments.
MIT students typically reported a factor of 2 less academic dishonesty than the national average although they were comparable to the national average on questions pertaining to obtaining outside help on assignments. Also, they felt that academic dishonesty at MIT was ϳ30% less prevalent than did students nationally ͑except only ϳ10% less on "inappropriate sharing on group assignments"͒. MIT students were also about 30% less tolerant of most forms of academic dishonesty including "collaborative working of homework" than the national average.
Conducting interviews about homework copying proved problematic because students were extremely reluctant to discuss their copying, even months after the course was over. We got no responses to ϳ15 email invitations for interviews, then invited about 40 of the most frequent copiers for interviews, offering $50/half hour. Only one of the approximately five who agreed to be interviewed actually kept the appointment. Calling individual frequent copiers ultimately led to about four emails or phone conversations in which all of our questions were answered and another four conversations where the student offered an off the record conversation about copying. ͑Several students declined payment.͒ This lack of response was specific to copying as we found out when investigating what we suspected was a false positive: our algorithm showed that a few students copied practice problems ͑albeit at a rate around 6% or less͒ more than for 1 Pearson regularly searches for posted solutions to its Mastering-Physics problems and requests that they be removed from the web. Course failure plus drop rate ͑in percent͒ vs copying fraction. The 20% of students who copy over 30% of their homework constitute 47% of the students who fail to complete 8.01 and 8.02 in two semesters. credit problems. An email questioning why this might have occurred elicited about a 30% response rate. Although our email deliberately suggested that this reflected collaboration on the practice problems, all of the responses attributed this to guessing, which they claimed not to do on for credit problems where it carried a small grade penalty. In comparison, an earlier email asking ϳ40 students to explain how they had answered a question so quickly elicited no responses. Interviews or conversations that did occur addressed study habits, collaborative behavior, mechanisms of copying, and especially deeper motivation and personal feelings about copying.
Our survey's open-ended questions and interviews give a rough indication of the sources of copied answers. The 25 responses to the survey question on copying electronic homework broke down as follows: ϳ55% asked friends ͑usually electronically and, as we argued previously, prior to examining the problem͒, ϳ25% attended collaborative problem-solving groups, ϳ20% logged into a friend's homework account, and a few created and distributed answer files. The interviewees offered little more insight into mechanisms of copying, generally confirming methods mentioned in the survey. Students readily admitted to group problem-solving sessions and one or two to pasting from one browser to another or circulating files of answers.

VI. WHY DO STUDENTS COPY THEIR HOMEWORK?
At MIT, as elsewhere, students and faculty agree that doing homework is essential to learning to solve problems like those on the tests. Why then do our students engage in the acknowledged self-destructive behavior of homework copying?
The differences in temporal patterns between repetitive copiers and other students suggest the proximate causes: they put very little effort into their homework until the last day before the deadline and are several times more likely not to finish by the deadline. In addition, homework copying increases over the term, increasing very significantly after midterms. These observations are consistent with the explanation that students copy homework in response to time pressures that build over the term and are exacerbated by delaying the start of serious work on the weekly assignment until the day it is due. However, the strong correlation of task orientation and copying suggests that those who are primarily oriented toward obtaining grades vs learning predominate among those who choose not to invest sufficient timely effort in their assignments.
Our survey confirmed time pressure as the prime factor to which students attribute their copying. Responses to "if you copied homework… please indicate your reasons" were "Lack of time due to other classes" ͑26%͒, "Problems were too difficult" ͑25%͒, "Problems took too much time" ͑13%͒, and "Don't care about learning physics" ͑3%͒.
Responses to the open-ended supplemental question "Please elaborate if you have any other reasons for copying homework" confirmed these reasons: ϳ47% of such responses cited time limitations as an important reason that individuals copied homework and ϳ37% cited the difficulty of the problems.
The interviews and open-ended responses on the survey revealed several rationalizations for copying. Responses included: "Copying didn't affect my grade because all I wanted to do was to pass," "it is us against you faculty," "I knew this pretty well from my high school physics course so it was only review," "not motivated to learn physics because I don't enjoy it and it's not needed for my major," "not interested," "cheating isn't bad because it hurts only you at test time," and two students said it was too trivial to waste time on. Not surprisingly, the heaviest self-reported copiers felt that copying homework was not a serious moral offense.
Many people have suggested to us that copying is the result of students being unable to complete the homework due to weak academic skills in spite of exerting good effort, but this supposition is contradicted by our data on several counts.
͑1͒ Copiers exert much less effort on their homework in the days prior to the due day.
͑2͒ No students cited their poor academic skills as reason for copying on the survey ͑although finding it "difficult" is a symptom of this͒.
͑3͒ Math skills ͑measured by the SAT II test that is required of MIT applicants͒ correlated strongly with the final exam score but not significantly with the amount of homework copying ͑however, copying shows a small negative correlation with the level of the math course that the students were taking͒.
͑4͒ The initial physics skills of serious copiers ͑who copy Ͼ50%͒ are less than those of the main group ͑Ͻ30%͒ by an insignificant 0.3 standard deviation on the MBT pretest and a barely significant 0.4 standard deviation among those ͑only 39%͒ who reported a SAT II Physics test score. These weak differences surely do not explain the well over 1 standard deviation difference in performance on the analytic problems on the final.
In summary, by far the strongest correlate of copying is delaying the start of effort on the homework until close to the due time. Lack of skill is a weak correlate of copying. That this lack of effort and the associated copying is in part a conscious decision is suggested by the strong correlation of demographic factors with copying. Predominately male students who are more interested in business than science or engineering, in getting an MIT degree than learning their major subject, in obtaining a passing grade than learning in introductory physics, and who do not consider copying homework as morally wrong as other students are far more likely not to allocate ͑perhaps by choice͒ enough time before the due day to make much progress on their homework and copied it in order to receive the credit.

VII. EVIDENCE THAT DOING HOMEWORK CAUSES SPECIFIC LEARNING
Copying correlates so strongly with declining relative test scores that the final exam grades on analytic problems of a hypothetical student who copies all his work average 2.42Ϯ 0.23 standard deviations below one who copies none-even though they started within ϳ1 / 3 standard deviation on physics tests prior to instruction at MIT. We now argue for causation: intellectual engagement with the online homework causes skill at analytic problems, leaving the copiers with a lower exam grade. Our argument has three legs: causation is consistent with prior scholarship and belief, copiers do learn physics if they actually do the relevant homework, and the most obvious alternative "common cause" explanations are contradicted by the data.
First of all, there is considerable research showing that doing homework leads to greater learning ͓15,16͔, consistent with the frequently expressed belief among both teachers and students at MIT that doing homework is essential for examination success. Second, repetitive copiers do most of the homework problems relevant to the MBT, on which they show as much learning as noncopiers, demonstrating that they do learn physics when they exert effort. What copiers fail to learn as well as their noncopying colleagues is skill on the analytic problems-these occur later on the homework and in the semester when copiers copy the largest fraction. Third, we find that copying homework correlates very strongly with declining exam performance but weakly or not at all with all measures of performance in math or physics prior to instruction. All these facts support the inference that doing online problems requiring analytic responses causes better examination performance on this type of problem.
Two reasonable alternative hypotheses ͑to "homework causes learning on analytic problems"͒ are based on the idea of a common causative factor. Perhaps "poor physics and math skills" lead to both declining performance and to homework taking up lots of time, increasing pressure to copy as the deadline approaches. This hypothesis is discredited because standardized tests and other measures of copier's math and physics skills show that repetitive copiers are not sufficiently weak to explain the large end of term performance difference. Furthermore, it fails to explain why repetitive copiers do not exert nearly as much effort in the days before the deadline as the other students.
A related alternative hypothesis is that "poor learning skills" retard the rate of learning of repetitive copiers. The fact that copiers exhibit comparable learning for the material on the MBT ͑for which they did the homework͒ argues that they have comparable learning skills for physics.
While a carefully controlled and randomized experiment could provide stronger evidence for causality, it seems reasonable to conclude that copying homework prevents intellectual engagement with analytic homework problems, reducing copiers learning of skill on such problems.

VIII. OBSERVED FOURFOLD REDUCTION IN COPYING FOLLOWING COURSE FORMAT CHANGES
We now discuss changes that have been accompanied by a reduction of homework copying by a factor of 4 at MIT ͑see Fig. 10͒.
The lecture-recitation format used in 2003 was replaced with a large scale implementation of studio physics ͓17͔ in all subsequent courses. Primary motivations for this change were to increase and personalize interactions between instructors and students and to introduce peer instruction. The course was divided into sections of ϳ75 students each; each section met for 5 h total each week with one professor and several teaching assistants. During class periods, students were given minilectures interspersed with questions answered using a personal response system followed by peer instruction, hands-on experiments, and group problemsolving sessions, often at the board. Students were broken into groups of not more than 3 and each student group had access to a computer used to enhance demonstrations and collect their experimental data. Students were assigned two ͑MasteringPhysics͒ homework assignments totaling ϳ6 for credit problems vs ϳ10 in 2003 and one somewhat longer graded written homework assignment each week. 2004 was the second year of testing this new format in 8.01 ͑it was originally developed in 8.02͒, with about 150 students who voluntarily signed up for it; in 2005 the studio version became the standard version with ϳ500 students, again with two shorter electronic homework assignments. The Spring 2006 class was a well-developed version of studio physics for electricity and magnetism with ϳ600 students and one short electronic assignment per week. The equation entry method in MasteringPhysics was changed from a text string to a symbolic equation builder which did not allow direct copying or easy electronic transmission of a solution. Grading was pass-no record in all Fall semesters, but ABC-no record in the Spring.
In addition to these steps, in 2005 some professors ͑about 2/3͒ showed their students the graph in Fig. 6 and pointed out that copying electronic homework endangered their passing the course; some of these professors also gave more explicit admonitions, but others felt that this would be detrimental to their trustful relationship with their students and/or that their students should learn the consequences for themselves. Apparently this had little effect since the copy rate showed no significant decline at the time this was done and was similar to the previous years.
Accompanying these changes, copying of electronic homework decreased by a factor of 4 from ϳ11% of submissions in Fall 2003 to under 3% in Spring 2006 ͑see Table I͒. The rate of failure ͑a D or F grade͒ also dropped very roughly in proportion ͑see Table I͒. We suspect that reduction in homework copying is responsible for a significant part of this reduction in failure rate.
We now offer our judgment of which of the several changes made in the course of reform were most responsible for the observed declines. Palazzo's meta-analysis ͓2͔ of selfreported academic dishonesty surveys found less cheating when students felt their teacher was more concerned with student learning than certification via testing. Therefore, we suggest that the increased contact between students and teaching staff in studio physics vs lecture recitation had the largest effect on the 2003-2004 decline. ͑Calculations based on Fig. 5 indicate that the shortening of homework assignments would cause only ϳ1 / 5 of the nearly 50% reduction observed.͒ It is more difficult to attribute a single cause to the decrease in copying in Spring 2006, but the switch to grades seems paramount given the several suggestions in our interviews that "copying would not affect my grade under pass or fail." The switch to the equation builder in Master-ingPhysics would seem to add some inconvenience, but only to the ϳ20% whose method of copying was by logging into a friend's account. The shorter assignments ͑2006͒ would not seem decisive judging from data in Fig. 5.

IX. SUMMARY
The message of this paper is that online homework copying can be detected, follows understandable temporal patterns, and is sufficiently prevalent that it is very likely causative of a significant fraction of course failure, especially in large lecture-based classes. Teachers who feel an obligation to help their students pass despite their moral shortcomings will therefore be encouraged by our finding that changing the course format and structure have resulted in a factor of 4 reduction in homework copying.
In particular, we showed that 10% of submitted problems were copied in a traditional lecture-recitation course graded pass-no record taken as a required course by nonphysics majors at MIT. The fraction of copied problems increased dramatically over the course of the semester and was greater for later problems in long assignments and those of greater difficulty. Most students copied much less than 10% of their problems, but about 1/5 of the students copied over 30% of their term's work. An anonymous survey and interviews showed significantly more copying among students who were motivated by a desire to pass the course and/or to obtain a degree rather than by a desire to learn. Observations confirmed findings from previous self-reported surveys that being male or a potential business major greatly increased copying. Actual copying was ϳ1 / 2 more than that selfreported toward the low end of other reports of actual cheating vs self-reported cheating.͓2͔ On our survey, copiers cited time pressure as the prime cause of their copying. Certainly this is consistent with the dramatic rise in copying over the term.
The correlation between copying of online homework and declining academic performance ͑relative to those who do not copy͒ is extraordinarily strong-about two standard deviations of relative change over one semester. It is also specific to the type of problems copied-those requiring an analytic response. There was no correlation between copying homework demanding analytic answers and the score improvement between pretest and post-test on conceptual and numerical problems covering material in the early part of the semester when far less copying occurs. We argue that this specificity, as well previous work on the positive effect of doing homework on learning, strongly suggests that not doing homework causes the correlated declining relative test performance. Our survey revealed that MIT students copied written homework at least 50% more than online homework and that nationally students self-reported more academic dishonesty and had more moral tolerance toward it than MIT students. This suggests widespread copying of written homework nationally, with concomitant course failure if it is as educationally effective as the online homework studied here.
We have observed a fourfold reduction in the amount of detected homework copying after the course was restructured in ways that previous cheating studies or the observed patterns of copying suggest would reduce copying. Steps taken included switching to studio physics format, providing more instructor contact, giving shorter and more frequent assignments, switching from pass-no record to grades, and discussing the correlation of copying and course performance with students.
In future work, it would be highly valuable to apply our copying detection algorithms to see if copying occurs at equal rates and is as serious a correlate of academic failure in lecture-recitation courses at other institutions. It is also important to determine whether the course reforms we made would reduce copying in other institutions. Since professors are reluctant to assume the role of enforcers, it is especially important to establish the effectiveness of such noncoercive measures.