Recommendations for statistical analysis involving null hypothesis significance testing

Background: The peer review process of original research articles generally requires authors/researchers to adopt accepted scientific methods, identify testable hypotheses and test those hypotheses using appropriate and established statistical methods. In Sports Biomechanics, authors are encouraged to submit original research articles that conform to these norms. Despite widespread use, null hypothesis significance testing (NHST) has received criticism on various counts, especially when there is a reliance on p-values alone (as defined below) for NHST. The p-value combines sample size, variance and differences in values within the calculation but its meaning is somewhat subtle and difficult to communicate in non-technical language, leading to over-simplification, distortions of its meaning and misinterpretation. Technically a p-value is the probability of obtaining an effect at least as extreme as the one in your sample data, assuming the truth of the null hypothesis. (Chang et al., 2019) Often p-values are interpreted as an estimate of the error rate in rejecting the null hypothesis, suggesting that a p-value of 0.05 means the probability of a type I error is 5%. However, the true error rate (false discovery of positive cases or effects) in an isolated study may range from 23-50% when p = 0.05 (Vidgen & Yasseri, 2016). This has led some to criticise the use of p-values in NHST as being more liberal than is generally understood (Sellke et al., 2001; Vidgen & Yasseri, 2016). Various limitations have been identified when using NHST (p-values), including, amongst others: Use of p-values in isolation without reference to effect sizes and confidence intervals; The potential for `p hacking` where data and analyses can be deliberately manipulated to reduce p-values; Simplistic dichotomous interpretations of p-values as either significant or non-significant; Incorrect interpretation of p > 0.05 as meaning no effect; Whether zero effect is really/always the comparison of clinical or practical interest (i.e. the smallest effect size of interest); Misinterpreting statistical significance as clinical or practical significance; Performing multiple statistical tests without adjusting the criterion p-value. Despite the limitations, critics generally acknowledge that NHST remains in common use and many of its problems stem more from misunderstanding and misuse than from inherent flaws (Miller, 2009).
© Copyright 2020 Sports Biomechanics. Routledge. Julkaistu Tekijä Routledge. Kaikki oikeudet pidätetään.

Aiheet: biomekaniikka tutkimus metodologia virhe tilastomatematiikka
Aihealueet: tekniset ja luonnontieteet valmennusoppi
DOI: 10.1080/14763141.2020.1782555
Julkaisussa: Sports Biomechanics
Julkaistu: Routledge 2020
Vuosikerta: 19
Numero: 5
Sivuja: 561-568
Julkaisutyypit: artikkeli
Kieli: englanti (kieli)
Taso: kehittynyt