A recent exchange on twitter the Thesiswhisperer wondered about why effects were disappearing as mentioned in this New York Times article. The feeling is this should not happen with the modern scientific culture, and yet I suspect modern academic scientific culture is partly to blame. To explain why I have to introduce a little known but rather simple statistical fact which may be called Regression towards the mean.
Let me do it by given a common example used in teaching regression towards the mean. The lecturer tells a class that he has the ability to improve people psychic ability. He then writes on a piece of paper a number between 1 and 100 without showing the class. He then asks the class to write down what they think the number is. He then reveals the number and asks to class to tell him how well they did.
Now the experiment starts. He takes a sample from the class of people who got furthest from his value, say the worst half. He points out these are obviously the less psychic to show the effect. With these he performs some ritual, perhaps to stand up turn round three times and say “esn-esn-on” (“nonsense” backwards).
Then he repeats the experiment but this time with only this worst half and low and behold, they perform better. That is there average guess is closer to his value than they were in the previous study.
The lecturer then admits there is no psychic ability involved in this, so what is going on. The trick is in the selection. Indeed if he looked at the Standard deviation for the whole class at the start and the standard deviation for the sample in the second they should be of approximately the same size. People are pretty randomly choosing their number, those who guess badly at first do that pretty randomly and actually if he had taken the full class the performance would have been much the same as before, only the ones who did better would have differed.
Regression to the mean basically means that if there is a selection bias in a distribution as fresh data is produced this will tend to go back towards the mean.
So what has this to do with non-repeatability. For starters I am not belittling this phenomena I have been involved in studies which aimed to replicate a previously carried out study. The prior study reported a huge effect so the power calculation required a small sample size, indeed so small we upped the numbers just to persuade the ethics committee that this was a genuine attempt to replicate. This only to find when we have the data collected that there is no effect visible in the data. So I have experienced non-repeatability.
Nor am I accusing researchers of bad practice. They are honestly reporting the results they get. It is the ability to report the results, i.e. the selection process by journals that produces the phenomena!
Published results and accepted results aren’t just a random sample of all results. They are selected for the results that demonstrate a genuine effect. They particularly like those results that are significant at p=.05 level. However gaining a p-value of less than .05 (or any value) is no guarantee that you have a true effect. For a start off with p=.05, one in twenty of results where there is genuinely no effect get reported as having an effect. Now that isn’t one in twenty of reported results (it might be lots higher or lower) but one in twenty of results where there is NOTHING is genuinely happening. Unfortunately we don’t know where these one in twenty are. It looks like a result even though it there isn’t an effect. We know there are type 1 error, our selection for criteria for publication reports allows one in twenty papers through where there is genuinely no effect.
But what if there is an effect? Well we are more likely to detect it if our sample happens to over-estimate the effect than if it under estimates the effect. In other words there are studies out there where there was a genuine effect but it did not get published because they drew a sample that performed badly. On the other hand on well designed experiments all the studies that draw fortunate samples are likely to significant. So the tendency is to over estimate actual effects because the selection criteria for publication favours those who draw fortunate samples.
This is not news, I have not suddenly found this inspiration, look there are learned reviews exploring this very topic. There are approaches when results start behaving in this way, one is to look at the sample size it would take to detect a difference between the original results and the fresh results, and if that is larger than the two studies then it may well be just the result of regression to the mean. It is also why clinicians are moving towards Meta Analysis rather than the results from just one study, but meta-analysis itself is hampered by the publishing bias.
I also want to sound a warning, skewed data (data where a few people produce very high scores) can quite easily produce odd ball results. This causes problems when sampling, there are statistical methods for analysing this but I have rarely seen them applied out side the statistical class room.
So yes I would expect the published results of effects on the whole to be over-estimates. The over-estimation is a product of the current scientific publishing culture. There are some approaches to alleviate this problem but at present no cure because the cure involves a change of culture.
Labels
accountability
agency
alcoholism
Alisdair MacIntyre
Anglican attitudes
Anglican communion
appreciation
approach
attitudes
audience
bad situations
Bible
binge drinking
blindness
call
Calvin
campaigning
change
chaos
Christian discipleship
Christmas
Church
church structures
coincidence
colonialism
committees
Communion
communities
community
Congregationalism
congregations
consumption;
conversion
councils
creation
crossing
culture
curiosity
customers
debate
debt
democracy
depression
Design Argument
desire
discrimination
dissenting
diversity
doctorates
doctors
dominance
drinking problem
Easter
ecumenics
Ecumenism
elderly
elements
Enlightenment
environment; fairness
episcopal churches
eternal life
ethics
evangelism
evolution
excess
Facebook
faith
faithful
fencing the table
finance
fishing
forgiveness
fowler
fraud
Free Churches
freedom
friendly
full time
gathered church
generosity
generous
gift or goal
God
gratitude
greetings
growth
heaven
history
holy
holy spirit
humanity
Humpty Dumpty
Hunter
hypocrisy
integration
Internet
invitation
Jesus Army
joining
Joy
laity
Liturgy
local congregations
love
MaM
mental health
merging
misrepresentation
mission
my experience
offices of the church
oil
open communion
outreach
p-values
pacifism
part time
passion
paths
Paul
pen names
power
preparedness
Presbyterian Blue
Presbyterianism
procedures.
progress
proxies
publishing culture
purpose
radical welcome
rant
Reformed tradition
refugees support
regression to the mean
relationships
replicability
respect
response
responsibility
ressurection
role
Ruth and Naomi
savouring
Scottish Congregational and United Reformed College
Scottish heritage
security
self esteem
situatedness
snow
Society for Liturgical Studies
soldiers
spiralling inwards
statistics
student fees
subordinate standards
substantial agreement
suffering
superhuman
symbolism
symbols
synods
tax
tee total
tension
the way of the cross
theologians
theologians in residence
thesis
time
triedness
United Reformed Church
unity
Universities
urban priority areas
URC
vocation
vulnerablility
Wardlaw
weakness
welcome
welcoming
young or old
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment