more optimizations

This commit is contained in:
2025-12-16 19:46:47 +01:00
parent 566b972465
commit efb66ac334
+19 -5
View File
@@ -18,9 +18,12 @@ execute:
#| label: setup #| label: setup
#| include: false #| include: false
source("deps.R") source("deps.R")
dir.create(dir_output_plots, recursive = TRUE, showWarnings = FALSE)
# uncomment the following line to disable graphs & tables # uncomment the following line to disable graphs & tables
#output_format <- "" #output_format <- ""
debug_mode <- FALSE
``` ```
\newpage \newpage
@@ -59,11 +62,11 @@ This raises the question: if modern research practices are so prone to bias and
The publication of Firebaugh's text coincided with the onset of the replication crisis, a period where widespread replication failures especially but not exclusively in psychology revealed systemic issues in research culture. This crisis wasn't limited to a few fraudulent cases but exposed a broader problem where seemingly robust, highly cited studies could not be reproduced. Examples ranged from unintended to outright data fabrication [@barghAutomaticitySocialBehavior1996; @callawayReportFindsMassive2011; @crockerRoadFraudStarts2011a]. While the crisis began in psychology, it soon spread to other fields like in political science and economics [@breznauDoesSociologyNeed2021]. For instance, a classic social priming study by @barghAutomaticitySocialBehavior1996, finding that participants primed with an "elderly" stereotype walked more slowly, failed to replicate. A follow-up-study suggested, that the original results were likely influenced by experimenter expectations rather than the hypothesized mechanism of unconscious priming [@doyenBehavioralPrimingIts2012]. While some extreme cases are well-documented, the crisis is largely seen as a result of systemic pressure and normal human behavior or misconduct than in serious intent [@diekmannII2Probleme2022; @crockerRoadFraudStarts2011a; @4ff8afa9-5c92-3c50-b832-a1756ccbeedc]. The publication of Firebaugh's text coincided with the onset of the replication crisis, a period where widespread replication failures especially but not exclusively in psychology revealed systemic issues in research culture. This crisis wasn't limited to a few fraudulent cases but exposed a broader problem where seemingly robust, highly cited studies could not be reproduced. Examples ranged from unintended to outright data fabrication [@barghAutomaticitySocialBehavior1996; @callawayReportFindsMassive2011; @crockerRoadFraudStarts2011a]. While the crisis began in psychology, it soon spread to other fields like in political science and economics [@breznauDoesSociologyNeed2021]. For instance, a classic social priming study by @barghAutomaticitySocialBehavior1996, finding that participants primed with an "elderly" stereotype walked more slowly, failed to replicate. A follow-up-study suggested, that the original results were likely influenced by experimenter expectations rather than the hypothesized mechanism of unconscious priming [@doyenBehavioralPrimingIts2012]. While some extreme cases are well-documented, the crisis is largely seen as a result of systemic pressure and normal human behavior or misconduct than in serious intent [@diekmannII2Probleme2022; @crockerRoadFraudStarts2011a; @4ff8afa9-5c92-3c50-b832-a1756ccbeedc].
The term crisis not only implies alarmingly high proportions, but also creates pressure to act. This is supported by findings spanning many fields: not only in Psychology there are many findings that support the notion that there is such thing as a crisis in many fields. Finance [@jensenThereReplicationCrisis2023], economics [@briggsPartialSolutionReplication2023], sociology [@auspurgAusmassUndRisikofaktoren2014] or medicine [@begleyRaiseStandardsPreclinical2012], with some authors even claiming that most published research findings in the social sciences are false [@ioannidisWhyMostPublished2005]. But what drives this crisis? The term crisis not only implies alarmingly high proportions, but also creates pressure to act. This is supported by findings spanning many fields: Finance [@jensenThereReplicationCrisis2023], economics [@briggsPartialSolutionReplication2023], sociology [@auspurgAusmassUndRisikofaktoren2014] or medicine [@begleyRaiseStandardsPreclinical2012], with some authors even claiming that most published research findings in the social sciences are false [@ioannidisWhyMostPublished2005]. But what drives this crisis?
## Questionable Research ## Questionable Research
Publication bias is the preference for publishing positive over negative or inconclusive results [@rosenthalFileDrawerProblem1979]. This bias, often called the 'file drawer problem' due to the rarity of submitted null findings, can occur at any stage in research [@kuhbergerPublicationBiasPsychology2014; @francoPublicationBiasSocial2014]. Other contributing practices include selective reporting, where null findings or variables are omitted from analysis [@breznauDoesSociologyNeed2021] and the post-hoc adaptation of hypotheses [@gerberPublicationBiasEmpirical2008]. But @breznauObservingManyResearchers2022 don't see publication bias as the main driver of the huge variance in results. Instead, they emphasize the role of idiosyncratic researcher variability and the broader context of research practices, leads to the problem of science practices that might produce unreliable or invalid results: so-called questionable research practices (QRP). One of the earlier discussed practices that distort scientific progress is called publication bias. @rosenthalFileDrawerProblem1979 defines it as the preference for publishing positive over negative or inconclusive results. This, often institutionally driven bias, also called the 'file drawer problem', can occur at any stage in research [@kuhbergerPublicationBiasPsychology2014; @francoPublicationBiasSocial2014]. Contributing practices include selective reporting, where null findings or variables are omitted from analysis [@breznauDoesSociologyNeed2021] and the post-hoc adaptation of hypotheses [@gerberPublicationBiasEmpirical2008]. But @breznauObservingManyResearchers2022 don't see publication bias as the main driver of the huge variance in results. Instead, they emphasize the role of idiosyncratic researcher variability and the broader context of research practices, leads to the problem of science practices that might produce unreliable or invalid results: so-called questionable research practices (QRP).
A truth-incentivizing survey of over 2000 psychologists revealed a high prevalence of QRPs. Around 60% admitted to not reporting all dependent measures, 50% to selective reporting, and 30% to falsely claiming they predicted an unexpected finding. About 2% even confessed to data falsification [@johnMeasuringPrevalenceQuestionable2012a]. Criminology shows similar patterns, though with lower rates due to the absence of incentives [@chinQuestionableResearchPractices2023]. A truth-incentivizing survey of over 2000 psychologists revealed a high prevalence of QRPs. Around 60% admitted to not reporting all dependent measures, 50% to selective reporting, and 30% to falsely claiming they predicted an unexpected finding. About 2% even confessed to data falsification [@johnMeasuringPrevalenceQuestionable2012a]. Criminology shows similar patterns, though with lower rates due to the absence of incentives [@chinQuestionableResearchPractices2023].
@@ -127,7 +130,7 @@ A focused literature review on adoption produced limited evidence as we still kn
Self-reports suggest high OSP familiarity-but they co-exist with widespread QRPs and are vulnerable to bias. In @chinQuestionableResearchPractices2023, 89% of respondents said they had used at least one OSP, yet 87% also admitted at least one QRP, and some serious QRPs (e.g., hiding known problems) were non-trivial. Survey data indicate that about 25% of researchers across fields have preregistered a study, with higher uptake in psychology (50-60%) and lower prevalence in sociology (~30%) [@fergusonSurveyOpenScience2023a]. Another survey in the field similarly estimated preregistration use at 45% (42-49%) [@chinQuestionableResearchPractices2023]. The reported prevalence of OD varies widely across disciplines. Survey data suggest that more than 60% of researchers report having posted data or code, with higher rates in psychology (>50%) compared to sociology (~35%) [@fergusonSurveyOpenScience2023a]. The prevalence of OM sharing is more limited compared to OD and access. Survey results indicate that 43% (40-47%) of researchers report providing access to their research materials [@chinQuestionableResearchPractices2023]. Few or no journals require data sharing in the field, coupled with rare preregistration and a tiny share of replication studies [@pridemoreReplicationCriminologySocial2018]. Self-reports suggest high OSP familiarity-but they co-exist with widespread QRPs and are vulnerable to bias. In @chinQuestionableResearchPractices2023, 89% of respondents said they had used at least one OSP, yet 87% also admitted at least one QRP, and some serious QRPs (e.g., hiding known problems) were non-trivial. Survey data indicate that about 25% of researchers across fields have preregistered a study, with higher uptake in psychology (50-60%) and lower prevalence in sociology (~30%) [@fergusonSurveyOpenScience2023a]. Another survey in the field similarly estimated preregistration use at 45% (42-49%) [@chinQuestionableResearchPractices2023]. The reported prevalence of OD varies widely across disciplines. Survey data suggest that more than 60% of researchers report having posted data or code, with higher rates in psychology (>50%) compared to sociology (~35%) [@fergusonSurveyOpenScience2023a]. The prevalence of OM sharing is more limited compared to OD and access. Survey results indicate that 43% (40-47%) of researchers report providing access to their research materials [@chinQuestionableResearchPractices2023]. Few or no journals require data sharing in the field, coupled with rare preregistration and a tiny share of replication studies [@pridemoreReplicationCriminologySocial2018].
The @moneva2025attitudes Netherlands Institute for the Study of Crime and Law Enforcement finds broadly positive attitudes but divergent views by method and career stage, and a long list of cultural, structural, legal/privacy, and cost barriers. @fessingerStateOpenScience2025 also shows strong approval (88% positive) and some experience (58% tried at least one OSP), but routine adoption looks limited (only 44% even hold a repository account). In contrast, an assessment of social science studies between 2014 and 2017 found no preregistered studies at all [@hardwickeEmpiricalAssessmentTransparency2020]. In their survey at the Netherlands Institute for the Study of Crime and Law Enforcement, @moneva2025attitudes find broadly positive attitudes but divergent views by method and career stage, and a long list of cultural, structural, legal/privacy, and cost barriers. @fessingerStateOpenScience2025 also shows strong approval (88% positive) and some experience (58% tried at least one OSP), but routine adoption looks limited (only 44% even hold a repository account). In contrast, an assessment of social science studies between 2014 and 2017 found no preregistered studies at all [@hardwickeEmpiricalAssessmentTransparency2020].
Article audits show far lower OSP uptake than surveys, implying either nondisclosure or overestimation. @greenspanOpenSciencePractices2024 coded 722 articles (2018-2022) across five leading journals and found OM in about a third of papers, but \<10% with OD, \<2% with open code or preregistration, and no upward trend. Article audits show far lower OSP uptake than surveys, implying either nondisclosure or overestimation. @greenspanOpenSciencePractices2024 coded 722 articles (2018-2022) across five leading journals and found OM in about a third of papers, but \<10% with OD, \<2% with open code or preregistration, and no upward trend.
@@ -157,7 +160,7 @@ In summary, the study population consists of all statistical-inference publicati
The sampling procedure involved drawing a large enough sample for the training using sequential sampling, in this specific context called active learning [@chickSequentialSamplingEconomics2012]. Faced with expected challenges in full-text acquisition, a rather demanding training pipeline, and unexpected low anticipated OSP prevalence, the sequential sampling approach was abandoned and an alternative approach was established. The sampling procedure involved drawing a large enough sample for the training using sequential sampling, in this specific context called active learning [@chickSequentialSamplingEconomics2012]. Faced with expected challenges in full-text acquisition, a rather demanding training pipeline, and unexpected low anticipated OSP prevalence, the sequential sampling approach was abandoned and an alternative approach was established.
The sample size was determined by a precision-based calculation to ensure a $\pm$ 1.5 percentage point confidence interval for the SI prevalence as a precision-based sample size calculation was deemed more suitable for an exploratory prevalence study [@blandTyrannyPowerThere2009]. Calculations were based on prevalences arbitrarily estimated using the results of the literature review described in @sec-osp-in-crim, explained further in the provided supplements. A minimum calculated total sample size equaled $\aprox$4265 publications to achieve a 95% confidence interval with a half-width of $\pm$ 1.5 pp using the @agrestiApproximateBetterExact1998 method. The sample size was determined by a precision-based calculation to ensure a $\pm$ 1.5 percentage point confidence interval for the SI prevalence as a precision-based sample size calculation was deemed more suitable for an exploratory prevalence study [@blandTyrannyPowerThere2009]. Calculations were based on prevalences arbitrarily estimated using the results of the literature review described in @sec-osp-in-crim, explained further in the provided supplements. A minimum calculated total sample size equaled $\approx$ 4265 publications to achieve a 95% confidence interval with a half-width of $\pm$ 1.5 pp using the @agrestiApproximateBetterExact1998 method.
First, Sample A, a random sample of up around 500 publications was manually classified to train the initial SI classifier. This step also helped estimate the effort for subsequent tasks. Next, an independent Sample B was drawn, stratified by year, thereby addressing problems in cross-validation and the non-independence of residuals assumptions of many machine-learning models [@robertsCrossvalidationStrategiesData2017]. First, Sample A, a random sample of up around 500 publications was manually classified to train the initial SI classifier. This step also helped estimate the effort for subsequent tasks. Next, an independent Sample B was drawn, stratified by year, thereby addressing problems in cross-validation and the non-independence of residuals assumptions of many machine-learning models [@robertsCrossvalidationStrategiesData2017].
@@ -359,6 +362,9 @@ if(output_format == "pdf/tex") {
} }
ggsave(file.path(dir_output_plots, "fig01-fig-freq-pubs-comp.eps"), (p1|p2)/(p3|p4), width = 8, height = 6, device = "eps")
ggsave(file.path(dir_output_plots, "fig01-fig-freq-pubs-comp.jpg"), (p1|p2)/(p3|p4), width = 8, height = 6, dpi = 800, device = "jpeg")
if (isTRUE(debug_mode)) { if (isTRUE(debug_mode)) {
debug_info[[knitr::opts_current$get("label")]] <- debug_info[[knitr::opts_current$get("label")]] <-
if (knitr::is_html_output()) "HTML" else "LaTeX" if (knitr::is_html_output()) "HTML" else "LaTeX"
@@ -686,6 +692,10 @@ p <- ggplot(yearly_long, aes(x = published_year, y = prop, color = variable)) +
tbl_osp_prev_overall_dsadj <- yearly_long tbl_osp_prev_overall_dsadj <- yearly_long
print(p) print(p)
ggsave(file.path(dir_output_plots, "fig02-fig-osp-adoption.eps"), p, width = 7, height = 5, device = "eps")
ggsave(file.path(dir_output_plots, "fig02-fig-osp-adoption.jpg"), p, width = 7, height = 5, dpi = 800, device = "jpeg")
if (isTRUE(debug_mode)) { if (isTRUE(debug_mode)) {
debug_info[[knitr::opts_current$get("label")]] <- debug_info[[knitr::opts_current$get("label")]] <-
if (knitr::is_html_output()) "HTML" else "LaTeX" if (knitr::is_html_output()) "HTML" else "LaTeX"
@@ -1090,6 +1100,10 @@ grid_publishers <- ggplot(
) )
print(grid_publishers) print(grid_publishers)
ggsave(file.path(dir_output_plots, "fig03-fig-osp-time-by-publisher.eps"), grid_publishers, width = 9, height = 9, device = "eps")
ggsave(file.path(dir_output_plots, "fig03-fig-osp-time-by-publisher.jpg"), grid_publishers, width = 9, height = 9, dpi = 800, device = "jpeg")
if (isTRUE(debug_mode)) { if (isTRUE(debug_mode)) {
debug_info[[knitr::opts_current$get("label")]] <- debug_info[[knitr::opts_current$get("label")]] <-
if (knitr::is_html_output()) "HTML" else "LaTeX" if (knitr::is_html_output()) "HTML" else "LaTeX"
@@ -1164,4 +1178,4 @@ if (isTRUE(debug_mode)) {
print(paste0("Output Format set to **", output_format, "**")) print(paste0("Output Format set to **", output_format, "**"))
} }
``` ```