Potential Problems with Data Processing.
The potential problems with data processing are legion, but most of them can be grouped under the headings in the following sections. It is safe to guess that every investigator has at some time committed or, at least, considered committing one of these sins. Once an experiment is designed, proper execution demands that all of the data collected be used. The investigator is not entitled to select only those data he wishes to use because they fit his hypothesis or for any other reason. There may be a great temptation to eliminate outliers, data points which lie at some distance from other data points on some kind of plot, to make the data less variable, to make them conform to a preconceived pattern or to fit a convenient regression. Nevertheless, data editing is an unethical practice, but it is by no means a new one. Several famous scientists, including Newton and Mendel, have been accused of editing the data to better fit their models. It isn't clear when this might have been done. Perhaps it would be less objectionable if the editing were done after the model had been confirmed--in successive editions of a book, for example. Regardless of the time or the reason it occurred, such editing is unethical. There are ways of handling such data. For a discussion of these ways the reader is referred to the section Failing to report negative results. At times, a perfectly wonderful hypothesis will generate perfectly awful data. There can be great pressures to produce wonderful data, but they are not always forthcoming. For students, the professor may imply that these data are necessary to get the degree. For faculty, the chairman or dean may imply that wonderful data and associated wonderful publications are necessary to get a promotion or even to keep their jobs. There may also be internally generated pressures. A scientist may have such a strong drive to succeed or to become famous that he thinks all of his experiments must result in wonderful data. Whatever the source of the pressure, it may be so overwhelming that the scientist will resort to manufacturing data. Those of us not in that situation find it hard to conceive, but many investigators who have been guilty of this scientific crime say that such pressures were the cause. The only solution to this problem is to reduce the stress by maintaining a reasonable perspective of the purposes and missions of science and by talking to supervisors about the real requirements for advancement. Using inappropriate statistical tests. This is a common form of error (misconduct?) amongst scientists. Most often they don't even know what they are doing is wrong. Take the scientist who collects data in a dose-response experiment. A range of doses of a particular agent produces graded responses from the subjects of the experiment. The investigator examines the data and concludes that the responses follow a sigmoid curve, one in which the original rate of rise with increasing dose is slow, becomes fast in middle ranges, and then slows again at high doses. The scientist would like to know the correlation between the dose given and the response obtained. He calculates a Pearson's r coefficient of correlation. This coefficient is for linear data, but he has already concluded that the relation was sigmoid, not linear. Therefore, he has chosen the wrong correlation method. Similarly, an investigator may wrongly use a parametric test on data that he knows are not normally distributed. The usual cause of this sort of error is ignorance on the part of the investigator. The editor of the journal and the referees for the paper may be just as ignorant as the author; the error may not be caught. On the other hand, ignorance is no excuse as we have seen earlier; most institutions employ statisticians who know how to properly handle data. Our own institution has a number of biostatisticians available for consultation regarding design, conduct and analysis of research experiments. They should certainly be consulted when an investigator is in doubt. If the investigator has no reason to doubt his incorrect analysis, then he has not had proper training. Violating the assumptions of the statistical test. Just as investigators may use inappropriate tests, they may also violate the assumptions made by the test they use. (Of course, this would make them inappropriate as well!) For example, investigators, in reviewing their data, often decide that analysis of variance is the appropriate statistical treatment. They use it, violating the main assumption of the test, namely, that the experiment was designed to use analysis of variance. If you want to use analysis of variance, you have to decide to use it while you are designing the experiment not after the data are gathered. An even bigger problem is encountered when experimenters enter their data into one of the myriad computer statistical packages and perform every test in the package, looking for any test that might yield "significant differences." It is doubtful that any statistician would endorse this practice. Likewise, the selection of the criterion value (alpha, level of significance or p value) must be made before the experiment, not after it. Admittedly, most experimenters do not even think about the criterion value, blindly accepting the entirely arbitrary, accepted value of 0.05 or 0.01. Even so, the value must be chosen explicitly or implicitly before the experiments. Once the experimental data has been collected, it is too late to select a more lenient 0.06 criterion! Similarly, an investigator may collect data at intervals after a control point, each data collection preceded by a treatment. For example, in a dose-response study an investigator could make control measurements and then treat the subject every hour with increasing doses of a drug. This, in itself, is not necessarily bad design, except that the order of presentation should probably be randomized. However, the investigator may apply a t-test to compare the responses to different drug doses. This would be inappropriate because the t-test requires that the data sets be independent. In this case, they clearly are not. Each one can be influenced by the preceding one. As with inappropriate use of statistical tests, violating assumptions is usually done in ignorance. The same comments apply. Performing multiple statistical tests.
p=1-(1-alpha)k. Therefore, the probability of rejecting the null hypothesis when it is, in fact, true (Type I error), increases as the number of tests increases. If the multiple tests are correlated or dependent, and not independent, the probability will lay somewhere between the value of p and the value of alpha. The nature of the relationship between number of tests and probability is shown in Figure 3 for four different values of alpha. Values on the ordinate can be read as the actual probabilities that one or more significant differences could be due to chance. Feild and Armenakis (1974) point out two ways of minimizing the probability of incorrectly rejecting the null hypothesis in this way. An investigator, concerned about this problem, could
In any case, all investigators should be aware of the difficulties of multiple tests. If they are not, they could be fooling themselves. A related error occurs when several analyses are performed, and "the significant outcomes are reported more faithfully than the insignificant outcomes. (Neher, 1967)" For example, 20 different analyses could be performed in an experiment, of which 19 are insignificant at the 0.05 level and only one is significant. This result is about what one would expect from chance alone, a spurious significant finding or type I error. There are several ways that this probability pyramiding may be introduced into data. It can occur when an experimenter does several analyses in a single study and reports only the significant differences or, at least, concentrates on them. It can occur when an experimenter repeats an experiment over and over until a significant difference is obtained (by chance), and he fails to mention in the report of the experiments that this was the nth such experiment. Finally, probability pyramiding occurs more insidiously when experimenters and journal editors publish only experiments with "positive results." It may be of some consolation to the poorer experimenter to know that if he repeats an experiment enough times, eventually he will obtain a significant difference. However, that result will have occurred by chance, not because of the effect of the independent variable on the dependent one. The probabilities resulting from pyramiding can be calculated from the formula presented in the last section. Therefore, it we assume that two studies are conducted for every one reported in print, then the real probability of a type I error, given an alpha of 0.05, is 0.098. Increasing the ratio to three studies for every one in print, changes the probability to 0.143. The same sort of analysis applies to filtering of experiments with negative results by experimenters and editors. The problem is worsened when it is noted that these errors compound. Therefore, if there are two studies done for every one in print and one study rejected for every one accepted, then the actual probabilities become
Using "canned" computer software without questioning or examining results for accuracy. The ubiquity of personal computers has made the job of data analysis easier. It has probably also increased the number of incorrect results reported in the literature. One cause of this increase is the use of commercially available or personally developed statistical software without ever determining that the software, in fact, gives the correct value for calculated statistical parameters. An investigator must determine that the values given in the computer printout are the correct ones. Simply running the program does not insure that the correct algorithm has been used to calculate the values or that the mathematical formulae have been entered correctly into the program or applied properly to the data. These considerations are all in addition to the statistical questions regarding the appropriateness of particular tests. Every computer program comes with a disclaimer regarding the use of the program and the responsibility taken by the program seller for errors. Users, whose habit is to trust computer software, should read those disclaimers very carefully. Then they should sit down with the data, a pencil, paper and a calculator and verify the results by hand. That is the only way to be certain that the results are correct. This should be done enough times to test all of the different types of data that will be analyzed. It is a tedious, but necessary, job.
Where to Go From Here:Introduction
|