Statistics of Scientific Fraud
A scientometric study estimating the percentage of fabricated experimental data
in biomedical scientific literature somewhere at 5-10 %
An awkward biochemical procedure of graphical data presentation has
created a unique trap for unscrupulous researchers. "Absent-minded" fabrication
of certain type of this procedure's output very often results in an "impossible"
picture that physically can not be based on any real data. Cases seem to
be sufficiently frequent to make statistics for quantitative measuring
Scatchard plot analysis is an extremely popular procedure in many fields
of biochemistry, immunology, pharmacology, etc. The meaning of this procedure
is that the saturation isotherm (i.e. the amount of substrate (ligand)
to some binder is measured as a function of added (or Total) amount
of this substrate - see left panel of the figure) is redrawn on a very
strange coordinate plane with Bound on X-axe and Bound/Free
on Y-axe (where Free=Total-Bound) - see the graph
in the center of the figure.
If this is the first time you hear about Scatchard plots, you should
have here the right feeling that it is rather difficult to understand how
the experimental data should look like on such coordinate plane. Indeed,
this graphical transformation (actually, it has some meaning) converts
a plain and clear presentation of experimental data into something lying
absolutely beyond understanding of most biomedical researchers. One funny
consequence is that fabrication of trustworthy Scatchard plot seems to
be also a too difficult job for them.
The picture above illustrates a dreadful pitfall in the process of fabrication.
Left panel represents the typical scenario of two "parallel" binding
experiments, i.e. two almost identical experiments giving pairs of experimental
points with the same Total concentration and different Bound
concn. Of course, on the usual coordinate plane it results in pairs of
experimental points lying precisely one under another.
On the Scatchard plot both X- and Y-axes are linear functions of Bound
signal, so, (perhaps with some effort) you may understand that these pairs
should lie approximately on the line drawn through the start of coordinates
as it is shown at the central panel of the figure (Free concentration
usually may be considered equivalent to Total).
Now let's imagine that you are trying to fabricate a nice-looking Scatchard
plot for such experiment directly, without fabricating data on normal coordinate
plot first. Obviously, if you are not too cautious, almost certainly you
will draw pairs of points lying one under another (right panel of fig.1).
In my view, this peculiarity in the Scatchard plot is quite sufficient
to make accusation of data fabrication. It is almost impossible to imagine
real dataset that may correspond to such plot, or at least its receiving
requires enormous and absolutely meaningless efforts. I also do not see
what sort of "honest" error may lead to this appearance.
Yet, a pair of official opinions (just simplest
bureaucratic figures of speech from Nature journal and ORI)
do not share my point of view at least partially.
Of course, this is not the most important type of scientific fraud,
it may be called an 'absent-minded' data falsification; but it may have
interesting applications. After I have found the case
of this folly in an article in Nature, I clearly understood
that it is really a very widespread type of error and it may provide unique
database for quantitative measuring scientific fraud.
Indeed, Scatchard plot analysis is an enormously popular procedure in
biomedical sciences; citation index shows over 1000 citations of original
Scatchard's article every year. About 3-4 times more authors use his plot
without citing anything. So, stupid browsing scientific journals and counting
falsified plots may resolve two problems:
1. It makes possible to estimate directly the percentage of fabricated
experimental data in biomedical sci. literature simply by dividing the
number of falsified plots (i.e. like right panel of fig.1) by the number
of correctly drawn plots of parallel experiments (i.e. looking like the
central panel of fig.1).
2. The Scatchard's article is dated by 1949, so it may also be possible
to estimate the trend of changing this percentage during last 30 years
in order to check the popular opinion that quality of science seriously
deteriorated itoday in comparison with heydays in sixties.
Apparently, the most convenient way to collect statistics for this study
is browsing the Journal of Biol. Chemistry - yearly subscription
of this journal contains usually about 200 articles displaying various
sorts of Scatchard plots. So, survey of this journal may be quite sufficient
for the first goal listed above, though, I guess, estimating the historical
trend of fraud may require more thorough study.
So far, I have succeeded to browse the subscription for this journal
for 1992. Take look, if interested, at technical
data. The result is that I have found 32 correctly drawn "parallel"
Scatchard plots and two fabricated plots.
It is not quite correct, but I also may add the mentioned above case
in Nature estimating that the number of Scatchard plots I have seen
accidentally since I started this project is less than that contained in
a half-year subscription for J.Biol.Chem., .
So, the "initial estimate" based on three cases is that the percentage
of fabricated plots is somewhere at 5-10%.
This number may be bigger if there are significant additional number
of correctly fabricated Scatchard plots. Yet, I don't think there may be
such cases in reality; in my view correct falsification requires from the
author much more wits than honest conducting the whole experiment, and,
therefore, I don't see why he should use data falsification.
This estimation does not mean that only 5-10% of biomedical researchers
fabricate data. Obviously, every research paper usually contains data from
several different types of experiments such as Scatchard plot analysis.
So, the percentage of articles with fabricated pictures at least for one
of them should be several times more - that is about 20-30%. Then, every
researcher participate in writing more than one article during his lifetime.
Therefore, the conclusion is that:
SCIENTIFIC FRAUD IS A COMMON PRACTICE.
MOST LABORATORY RESEARCHERS IN BIOMEDICINE FABRICATE DATA FROM TIME TO TIME.
Dealing with Scatchard plots, I have found at least two other simalar
indicators of data fabrication. Unfortunately, they can't serve for numerical
estimation; but being substantially more widespread, they certainly support
impression that my conclusion above is correct.
For more detailed report read my article
trivial errors in Scatchard plot analysis; in brief, the first other
indicator is that the scattering of datapoints around trend line on most
Scatchard plots does not comply the usual property of physical measurement
that small signals are measured with bigger relative error (CV). Instead,
they seem to stress the common notion of a "nice looking curve". It is
not a conclusive indicator, but, as I wrote, it just supports an impression.
The other indicator is again a funny one. There is another type
of binding experiment called inhibition (or displacement) experiment resulting
in, roughly speaking, the same curve but turned upside down. Scatchard
transformation can not be applied to such curve directly (like floppy disk
can not be read if it is inserted upside down). Or if an obvious modification
was used, it physically can't result in a "nice looking curve". Nevertheless
very numerous papers present really nice Scatchard plots derived from inhibition
curves. I think it is impossible and therefore all these plots were fabricated, but
I am not sure. Several times I tried to receive explanations from authors
about method of deriving their Scatchard plots, but, of course, there were
no responses. So, again, this indicator just supports my impression.
in january 2013 i repeated experiment. Advent of internet technologies made it now much easier to conduct:
I've just performed google images search for "scatchard plot" .
The result was quite compatible with old data - I've found another 5 instances of fabricated Scatchard plots (first article represented with two almost identical graphs).
its per approximately 50-60 "correct" parallel Scatchard plots. Therefore now the estimate of fabricated plots percentage is closer to 10%.
Perhaps, some rise from my estimates made in mid-ninetyies. More importantly, this result is now much more statistically valid.