adds stuff

2026-04-16 19:18:19 +02:00
parent c4a46fc21b
commit ada154a107
3 changed files with 175 additions and 21 deletions
@@ -1,4 +1,62 @@
 # Introduction
 This repository contains the quarto project for the article "Mining Transparency: Assessing Open Science Practices in Crime Research Over Time Using Machine Learning".
-Extensions:
+This project only contains the replication files for the manuscript. The scraping, metadata download and classifier code is available in the method report which can be found in the OSF repository.
- [kapsner/authors-block](https://github.com/kapsner/authors-block): brings the capability to add an author-related header block when rendering docx-documents with Quarto.
+
 ## How to run
 In linux:
 ```bash
 make all
 ```
 Windows:
 ```bash
 uninstall windows, install linux, run "make all" in linux terminal
 ```
 ## Technical Requirements
 The method report requires rather intense calculations, this manuscript should run on a simpler machine. The project was written and tested on a linux machine but should also run on windows and macOS. The project is set up to run in a virtual R environment using the `renv` package, which ensures that all necessary packages and their specific versions are installed for the project to run correctly. The project also relies on Quarto for rendering the documents.
 ### Dependencies
 - R (4.5.1+)
 - renv R-library
 - Quarto
 - pandoc
 There are two packages that might need to be installed beforehand:
 - [gtsummary](https://www.danieldsjoberg.com/gtsummary/)
 - [ggthemr](https://github.com/Mikata-Project/ggthemr)
 For the R package `gtsummary`, you'll need to install the `libv8` library manually if on linux. Windows installation should work right away, sefer to the [manual](https://www.danieldsjoberg.com/gtsummary/).  See `globals.R` for more info and all necessary packages that should (!) be automatically installed when you run the `renv::restore()` command. More info on how to install on arch can be found [here](https://aur.archlinux.org/packages/v8-r). Alternatively, the environment variable `DOWNLOAD_STATIC_LIBV8` can be set to "1". For more on requirements and how to install, see the info in the `globals.R` file.
 ggplot plots are generated using ggthemr. ggthemr can be installed using devtools. the installation is explained in the [git repository](https://github.com/Mikata-Project/ggthemr) of `ggthemr`.
 ::: callout-important
 It is important to install the dependencies of gtsummary as well as the R packages devtools and ggthemr before restoring the virtual R environment.
 :::
 It is also important to note that a full run of the document requires environment variables to be set in the `.Renviron` file. Here is an example:
 ```{bash}
 ❯ cat ~/.Renviron 
 OPENAI_API_KEY = "sk-proj--zt7maBiONziZFYlVKuXnGOmmuZkhSjjNwI[...]"
 DOWNLOAD_STATIC_LIBV8=1
 RENV_CONFIG_SANDBOX_ENABLED = FALSE
 ```
 `OPENAI_API_KEY` has to contain the api-key for the OpenAI API, `DOWNLOAD_STATIC_LIBV8` is set to 1 for a quicker install of `libv8` (see the installation instructions of `gtsummary` on linux) and `RENV_CONFIG_SANDBOX_ENABLED` is enabled simply to reduce warnings. The latter can be left out with no negative effect except some warnings during all steps involving multiprocessing.
 Quarto Extensions:
 - [kapsner/authors-block](https://github.com/kapsner/authors-block): brings the capability to add an author-related header block when rendering docx-documents with Quarto.
 ## License
 This work is licensed under a [Creative Commons Attribution-ShareAlike 4.0 International License](https://creativecommons.org/licenses/by-sa/4.0/).
@@ -46,6 +46,23 @@ source("deps.R")
 This document serves as a supplement to the main article, providing additional details on the sampling approach, sample size determination, model training procedures, and evaluation metrics used in the study of Open Science Practices (OSP) adoption in scientific publications. The full methodological report, containing all code necessary for full replication can be accessed in the OSF repository.
 # Data Availability
 ## Legal Considerations
 The TDM policies of all publishers whose content appears in the corpus were reviewed individually. The majority of publishers explicitly permit TDM under institutional access or open-access conditions, including SAGE, Cambridge University Press, Taylor & Francis, Nature, Emerald, Annual Reviews (upon request), and Wiley (via API). Elsevier permits TDM via its official API but not scraping. Oxford University Press permissions depend on the institutional library agreement, and Brill permits TDM depending on the individual article licence. ASCE explicitly prohibits TDM. For publishers where no explicit policy was found (Modern Law Review) or where policies were ambiguous, EU Directive 2019/790 (Articles 3 and 4) was applied as the operative framework, which broadly permits TDM for scientific research purposes by authorised users accessing content through institutional subscriptions. MDPI and Internet Policy Review content is fully open access and permissively licensed. Regardless of whether TDM is permitted, the right to mine does not extend to a right to redistribute the underlying full texts, which is the operative restriction on public data sharing in this study. Therefore, only metadata can be made available. The labelled dataset containing metadata and OSP labels for the sample is available in the OSF repository.
 ## Replication Materials
 All code necessary for full replication of the study is available in the OSF repository at the following link:
 - https://osf.io/rvpc3/overview?view_only=0307dc0d99f74b50a738720a4a757aa0
 The Files contain three subfolders:
 - A code folder containing the full, **raw** quarto project with all code and data necessary for replication. A README file with instructions for replication is included in the code folder. This also includes a preprocessed dataset containing metadata and OSP labels for the sample, which can be used for replication of the analysis and figures in the main article without needing to run the full code. This version is more suitable for users with less experience in R, as it allows for a more straightforward replication of the results. A README file with instructions for replication is included in the data folder.
 - A manuscript folder containing the full manuscript quarto project that was used for all analyses and figures in the main article. 
 - The fully **rendered** Methodological Report, which contains all details on the methods used in the study, including the sampling approach, sample size determination, model training procedures, and evaluation metrics. The report also contains discussions of all decisions taken during the process, as well as the full descriptions and specifications of the models used and the preprocessing steps.
 # Sampling Approach
 The process involved in the following steps:
@@ -218,19 +235,19 @@ Extraction Method: HTML
 Paywall Status: open_access
 ================================================================================
-TITLE: The passage of Australia’s data retention regime: national security, human rights, and media scrutiny
+TITLE: The passage of Australia's data retention regime: national security, human rights, and media scrutiny
 AUTHORS: Nicolas P. Suzor, Kylie Pappalardo, Natalie McIntosh
 ABSTRACT:
 Abstract
-            In 2015, the Australian government passed the Telecommunications (Interception and Access) Amendment (Data Retention) Act, which requires ISPs to collect metadata about their users and store this metadata for two years. From its conception, Australia’s data retention scheme has been controversial. In this article we examine how public interest concerns were addressed in Australian news media during the Act’s passage. The Act was ultimately passed with bipartisan support, despite serious deficiencies. We show how the Act’s complexity seemed to limit engaged critique in the mainstream media and how fears over terrorist attacks were exploited to secure the Act’s passage through parliament.
+            In 2015, the Australian government passed the Telecommunications (Interception and Access) Amendment (Data Retention) Act, which requires ISPs to collect metadata about their users and store this metadata for two years. From its conception, Australia's data retention scheme has been controversial. In this article we examine how public interest concerns were addressed in Australian news media during the Act's passage. The Act was ultimately passed with bipartisan support, despite serious deficiencies. We show how the Act's complexity seemed to limit engaged critique in the mainstream media and how fears over terrorist attacks were exploited to secure the Act's passage through parliament.
 FULL TEXT:
-Title: The passage of Australia’s data retention regime: national security, human rights, and media scrutiny
+Title: The passage of Australia's data retention regime: national security, human rights, and media scrutiny
 Abstract: Abstract
-            In 2015, the Australian government passed the Telecommunications (Interception and Access) Amendment (Data Retention) Act, which requires ISPs to collect metadata about their users and store this metadata for two years. From its conception, Australia’s data retention scheme has been controversial. In this article we examine how public interest concerns were addressed in Australian news media during the Act’s passage. The Act was ultimately passed with bipartisan support, despite serious deficiencies. We show how the Act’s complexity seemed to limit engaged critique in the mainstream media and how fears over terrorist attacks were exploited to secure the Act’s passage through parliament.
+            In 2015, the Australian government passed the Telecommunications (Interception and Access) Amendment (Data Retention) Act, which requires ISPs to collect metadata about their users and store this metadata for two years. From its conception, Australia's data retention scheme has been controversial. In this article we examine how public interest concerns were addressed in Australian news media during the Act's passage. The Act was ultimately passed with bipartisan support, despite serious deficiencies. We show how the Act's complexity seemed to limit engaged critique in the mainstream media and how fears over terrorist attacks were exploited to secure the Act's passage through parliament.
 This paper is part of Australian internet policy, a special issue of Internet Policy Review guest-edited by Angela Daly and Julian Thomas.
@@ -37,7 +37,7 @@ When evidence makes headlines, influences public opinions, shapes policing, sent
 > "Only by [...] repetitions can we convince ourselves that we are not dealing with a mere isolated 'coincidence', but with events which, on account of their regularity and reproducibility, are in principle inter-subjectively testable." [@popperLogicScientificDiscovery2005, p. 23]
-To challenge bias and to support replication of research, a movement has formed within the scientific community, fueled by the "replication crisis" that was especially prevalent within the field of psychology [@dienlinAgendaOpenScience2021]. The open science movement tries to establish open science practices (OSPs) to challenge many of the known biases that endanger the reliability of the scientific process and enable access to the scientific discourse for a broader public @banksAnswers18Questions2019. The ongoing debate of the last decades was especially focused on two OSPs: 
+To challenge bias and to support replication of research, a movement has formed within the scientific community, fueled by the "replication crisis" that was especially prevalent within the field of psychology [@dienlinAgendaOpenScience2021]. The open science movement tries to establish open science practices (OSPs) to challenge many of the known biases that endanger the reliability of the scientific process and enable access to the scientific discourse for a broader public [@banksAnswers18Questions2019]. The ongoing debate of the last decades was especially focused on two OSPs: 
 *First*, openly sharing materials, data and code enables replication that reduces p-hacking, surfaces errors, spreads methodological knowledge and might reduce burdens on the researcher, driving broader adoption across science [@freeseAdvancesTransparencyReproducibility2022; @freeseReplicationStandardsQuantitative2007; @finkReplicationCodeAvailability2024]. *Second*, preregistration involves thoroughly outlining and documenting research plans and their rationale in a repository before conducting the research, reducing deliberate or unconscious decisions taken to improve findings, challenging publication bias and other biases [@managoPreregistrationRegisteredReports2023; @hardwickeReducingBiasIncreasing2023; @mertensPreregistrationAnalysesPreexisting2019]. 
@@ -55,7 +55,7 @@ But first, a closer look at the underlying issues leading to the recent developm
 # Background
-In his widely reviewed standard reading "Seven rules for social research", @4ff8afa9-5c92-3c50-b832-a1756ccbeedc emphasizes the importance of the reproduction of research findings. But already in the title of the chapter or the rule itself, Firebaugh cuts back on his appeal: "replicate *where possible*". He notes increasing data availability, yet acknowledges challenges for true replication. Given the books influence since 2008, one might expect replication and replication-enabling practices to be widely adopted today. But is this the case?
+In his widely reviewed standard reading "Seven rules for social research", @4ff8afa9-5c92-3c50-b832-a1756ccbeedc emphasizes the importance of the replication of research findings. But already in the title of the chapter or the rule itself, Firebaugh cuts back on his appeal: "replicate *where possible*". He notes increasing data availability, yet acknowledges challenges for true replication. Given the books influence since 2008, one might expect replication and replication-enabling practices to be widely adopted today. But is this the case?
 Besides the theoretically driven discourse, there are quite tangible reasons to talk about the scientific method, replication and the publication process. Analyzing 77 research teams assessing the same dataset for a single hypothesis, @breznauObservingManyResearchers2022 found extremely diverse results, ranging from strong positive to strong negative outcomes. They termed this phenomenon "researcher degrees of freedom", explaining that most of the variance in results was not explained by assigned conditions, research decisions, or researcher characteristics. Instead, idiosyncratic researcher variability accounted for more than 90% of the variance.
@@ -63,7 +63,7 @@ This raises the question: if modern research practices are so prone to bias and
 ## From Replication Crisis to Credibility Revolution? {#sec-replication-crisis}
-The publication of Firebaugh's text coincided with the onset of the replication crisis, a period where widespread replication failures especially but not exclusively in psychology revealed systemic issues in research culture. This crisis wasn't limited to a few fraudulent cases but exposed a broader problem where seemingly robust, highly cited studies could not be reproduced. Examples ranged from unintended to outright data fabrication [@barghAutomaticitySocialBehavior1996; @callawayReportFindsMassive2011; @crockerRoadFraudStarts2011a]. While the crisis began in psychology, it soon spread to other fields like in political science and economics [@breznauDoesSociologyNeed2021]. For instance, a classic social priming study by @barghAutomaticitySocialBehavior1996, finding that participants primed with an "elderly" stereotype walked more slowly, failed to replicate. A follow-up-study suggested, that the original results were likely influenced by experimenter expectations rather than the hypothesized mechanism of unconscious priming [@doyenBehavioralPrimingIts2012]. While some extreme cases are well-documented, the crisis is largely seen as a result of  systemic pressure and normal human behavior or misconduct than in serious intent [@diekmannII2Probleme2022; @crockerRoadFraudStarts2011a; @4ff8afa9-5c92-3c50-b832-a1756ccbeedc].
+The publication of Firebaugh's text coincided with the onset of the replication crisis, a period where widespread replication failures especially but not exclusively in psychology revealed systemic issues in research culture. This crisis wasn't limited to a few fraudulent cases but exposed a broader problem where seemingly robust, highly cited studies could not be reproduced. Examples ranged from unintended to outright data fabrication [@barghAutomaticitySocialBehavior1996; @callawayReportFindsMassive2011; @crockerRoadFraudStarts2011a]. While the crisis began in psychology, it soon spread to other fields like in political science and economics [@breznauDoesSociologyNeed2021]. For instance, a classic social priming study by @barghAutomaticitySocialBehavior1996, finding that participants primed with an "elderly" stereotype walked more slowly, failed to replicate. A follow-up-study suggested, that the original results were likely influenced by experimenter expectations rather than the hypothesized mechanism of unconscious priming [@doyenBehavioralPrimingIts2012]. While some extreme cases are well-documented, the crisis is largely seen as a result of systemic pressure - such as institutional incentives to "publish or perish" - and normal human behavior or misconduct rather than serious intent [@diekmannII2Probleme2022; @crockerRoadFraudStarts2011a; @4ff8afa9-5c92-3c50-b832-a1756ccbeedc].
 The term crisis not only implies alarmingly high proportions, but also creates pressure to act. This is supported by findings spanning many fields: Finance [@jensenThereReplicationCrisis2023], economics [@briggsPartialSolutionReplication2023], sociology [@auspurgAusmassUndRisikofaktoren2014] or medicine [@begleyRaiseStandardsPreclinical2012], with some authors even claiming that most published research findings in the social sciences are false [@ioannidisWhyMostPublished2005]. But what drives this crisis?
@@ -71,11 +71,11 @@ The term crisis not only implies alarmingly high proportions, but also creates p
 One of the earlier discussed practices that distort scientific progress is called publication bias. @rosenthalFileDrawerProblem1979 defines it as the preference for publishing positive over negative or inconclusive results. This, often institutionally driven bias, also called the 'file drawer problem', can occur at any stage in research [@kuhbergerPublicationBiasPsychology2014; @francoPublicationBiasSocial2014]. Contributing practices include selective reporting, where null findings or variables are omitted from analysis [@breznauDoesSociologyNeed2021] and the post-hoc adaptation of hypotheses [@gerberPublicationBiasEmpirical2008]. But @breznauObservingManyResearchers2022 don't see publication bias as the main driver of the huge variance in results. Instead, they emphasize the role of idiosyncratic researcher variability and the broader context of research practices, leads to the problem of science practices that might produce unreliable or invalid results: so-called questionable research practices (QRP). 
 A truth-incentivizing survey of over 2000 psychologists revealed a high prevalence of QRPs. Around 60% admitted to not reporting all dependent measures, 50% to selective reporting, and 30% to falsely claiming they predicted an unexpected finding. About 2% even confessed to data falsification [@johnMeasuringPrevalenceQuestionable2012a]. Criminology shows similar patterns, though with lower rates due to the absence of incentives [@chinQuestionableResearchPractices2023]. 
 In their excellent manifesto for reproducible science, @munafoManifestoReproducibleScience2017 conceptionalize QRPs along the typical stages of empirical research. Common QRPs include HARKing or presenting an unexpected exploratory finding as a preplanned hypothesis, p-hacking or manipulating data or analysis to achieve a desired p-value and selective reporting, that is not reporting studies or variables that lack significant results. Other QRPs involve undisclosed data exclusion, stopping data collection when a desired result is found, or not reporting all conditions or measures used. These practices inflate false-positive rates and undermine research credibility [@auspurgAusmassUndRisikofaktoren2014; @breznauDoesSociologyNeed2021; @chinQuestionableResearchPractices2023].
-Other problematic practices involve the misuse of p-values, where researchers simply misinterpret the significance level as the likelihood of truth in their findings, leading to vast overconfidence in their results-that can also be a consequence of or lead to a failure to control for bias and poor quality control [@breznauDoesSociologyNeed2021; @munafoManifestoReproducibleScience2017]. Demographic, geographic or political biases and peer review limitations are more sources for error [@breznauDoesSociologyNeed2021; @grossmannOpenScienceReform2021]. Additionally, gendered penalties favor men publishing disproportionately more than women @akbaritabarGenderPatternsPublication2021. Misaligned institutional incentives, also accelerated by an intense competition for academic jobs, tenure and funding, lead to a so-called "publish or perish" culture [@smaldinoOpenScienceModified2019; @breznauDoesSociologyNeed2021]. 
+Problematic practices involve the misuse of p-values, where researchers simply misinterpret the significance level as the likelihood of truth in their findings, leading to vast overconfidence in their results-that can also be a consequence of or lead to a failure to control for bias and poor quality control [@breznauDoesSociologyNeed2021; @munafoManifestoReproducibleScience2017]. Demographic, geographic or political biases and peer review limitations are more sources for error [@breznauDoesSociologyNeed2021; @grossmannOpenScienceReform2021]. Additionally, gendered penalties favor men publishing disproportionately more than women [@akbaritabarGenderPatternsPublication2021]. Misaligned institutional incentives, also accelerated by an intense competition for academic jobs, tenure and funding, lead to a so-called publish or perish culture [@smaldinoOpenScienceModified2019; @breznauDoesSociologyNeed2021]. 
 A truth-incentivizing survey of over 2000 psychologists revealed a high prevalence of QRPs. Around 60% admitted to not reporting all dependent measures, 50% to selective reporting, and 30% to falsely claiming they predicted an unexpected finding. About 2% even confessed to data falsification [@johnMeasuringPrevalenceQuestionable2012a]. Criminology shows similar patterns, though the study of @chinQuestionableResearchPractices2023 recorded lower rates of admission, likely because its design lacked the truth-incentivizing mechanisms used by @johnMeasuringPrevalenceQuestionable2012a.
 All the above leads to the conclusion, that our institutions make refutation harder than confirmation. Open science (OS) is the design response, resetting defaults to transparency, pre-specification, and reproducibility. @munafoManifestoReproducibleScience2017 translate that philosophy into a lifecycle blueprint: blinding and preregistration, stronger methods training and independent oversight, open data, code and diversified peer review to harden reproducibility, evaluation and other measures. The central movement to address the above issues is the OS movement, devoting its effort to challenge publication bias, low statistical power, p-hacking, HARKing and other problems by increasing reproducibility and transparency [@grossmannOpenScienceReform2021]. 
@@ -83,15 +83,15 @@ All the above leads to the conclusion, that our institutions make refutation har
 Following an extensive literature review, @vicente-saezOpenScienceNow2018a characterize OS using four differentias: transparency in communication, accessibility or searchability to all data and materials, sharing of everything with a commitment to do so and collaboration along a scientific, distributed global dialogue throughout all stages involved in science. They integrate these into a succinct definition: "Open Science is transparent and accessible knowledge that is shared and developed through collaborative networks" [@vicente-saezOpenScienceNow2018a, p. 434]. @banksAnswers18Questions2019 establish a broader definition of os that refers to many concepts, including scientific philosophies embodying communality and universalism, specific practices operationalizing these norms including os policies. A common ground is that *open* science and OSPs try to prevent research misconduct by simply increasing research transparency.
-Building on these definitions, in line with the work of many other authors from diverse disciplines [e.g. @dienlinAgendaOpenScience2021; and @greenspanOpenSciencePractices2024], there are numerous practices that have been proposed to enact OS.
+Building on these definitions, in line with the work of many other authors from diverse disciplines [e.g. @dienlinAgendaOpenScience2021 and @greenspanOpenSciencePractices2024], there are numerous practices that have been proposed to enact OS.
 ### Open Data and Open Materials
 *Open data* and *open materials* enable replication by publishing all materials necessary to reproduce research in detail, finding errors, bias or simply support the results of the replicated work [@dienlinAgendaOpenScience2021]. 
-**Open data** (OD) is defined as *the sharing of data that was collected, generated or obtained from a third party and processed to investigate the research question assessed in the publication*. Open materials are often shared alongside open data. To delineate a differentiated picture as sharing behavior for data and materials can be expected to differ due to for example privacy concerns, **open materials** (OM) are distinctively defined as *all research materials necessary to reproduce the reported results like notebooks, code or syntax, guides, protocols that can be shared digitally*. Both definitions closely follow the definitions given by the @americanpsychologicalassociationOpenScienceBadges.
+**Open data** (OD) is defined as *the sharing of data that was collected, generated or obtained from a third party and processed to investigate the research question assessed in the publication*. Open materials are often shared alongside open data. To delineate a differentiated picture as sharing behavior for data and materials can be expected to differ due to for example privacy concerns, **open materials** (OM) are distinctively defined as *all research materials necessary to reproduce the reported results like notebooks, code or syntax, guides, protocols that can be shared digitally*. Both definitions closely follow the definitions given by the @americanpsychologicalassociationOpenScienceBadges. 
-First, there is accumulating evidence that providing data alongside publications increases visibility and impact. Some estimates suggest around a 30% citation increase for papers that share data, and importantly, this advantage appears at least partly independent of JIF [@tennantAcademicEconomicSocietal2016; @banksAnswers18Questions2019]. Beyond citations, openly available datasets enable the exploration by others, supporting novel findings and exploratory, hypothesis-generating work [@piwowarSharingDetailedResearch2007; @piwowarStateOALargescale2018].
+OD and OM have several benefits. First, there is accumulating evidence that providing data alongside publications increases visibility and impact. Some estimates suggest around a 30% citation increase for papers that share data, and importantly, this advantage appears at least partly independent of JIF [@tennantAcademicEconomicSocietal2016; @banksAnswers18Questions2019]. Beyond citations, openly available datasets enable the exploration by others, supporting novel findings and exploratory, hypothesis-generating work [@piwowarSharingDetailedResearch2007; @piwowarStateOALargescale2018].
 Second, openness improves methodological rigor and documentation. Knowing that others will inspect our code, data, and decisions incentivizes clearer documentation, more careful workflows, and fewer statistical errors in final papers [@tennantAcademicEconomicSocietal2016; @banksAnswers18Questions2019]. This also promotes transparency about analytic choices and potential biases [@breznauDoesSociologyNeed2021].
@@ -109,7 +109,7 @@ In short, many systemic and researcher-centric challenges cut across OSPs - and
 A preregistration is a time-stamped plan for a study's hypotheses, design, and analysis, often made public. Its contents vary by method (e.g., hypotheses, sampling, interview guides, exclusion rules, analysis plans) [@loggPreregistrationWeighingCosts2021; @managoPreregistrationRegisteredReports2023; @americanpsychologicalassociationOpenScienceBadges].
-Timestamping restrains HARKing by separating predictions from evidence, reducing the flexibility for post-hoc theorizing [@scogginsMeasuringTransparencySocial2024; @loggPreregistrationWeighingCosts2021]. More broadly, by committing ex ante, researcher degrees of freedom are narrowed. The analytic and design choices that otherwise enable selective reporting or specification searching are constrained, and any deviations become visible to readers and reviewers. The same logic limits p-hacking: when transformations, outlier rules, model families, covariates, and confirmatory contrasts are specified in advance, cherry-picking becomes less feasible because analytical decisions are made independently of the data. Preregistration also addresses structural issues of study quality. Declaring sample-size requirements upfront helps prevent underpowered designs at construction [@kuhbergerPublicationBiasPsychology2014; @grossmannOpenScienceReform2021]. We predefine theory, measures, and analyses, seek early input, and document choices so reviewers can vet them and avoid misinterpretation-strengthening credibility [ @evansImprovingEvidencebasedPractice2023; @sarafoglouSurveyHowPreregistration2022; @scogginsMeasuringTransparencySocial2024]. Preregistration helps separate confirmatory from exploratory work, reduces publication bias (e.g., via Registered Reports), and narrows "researcher degrees of freedom" [@simmonsFalsePositivePsychologyUndisclosed2011].
+Timestamping restrains HARKing by separating predictions from evidence, reducing the flexibility for post-hoc theorizing [@scogginsMeasuringTransparencySocial2024; @loggPreregistrationWeighingCosts2021]. More broadly, by committing ex ante, researcher degrees of freedom are narrowed. The analytic and design choices that otherwise enable selective reporting or specification searching are constrained, and any deviations become visible to readers and reviewers. The same logic limits p-hacking: when transformations, outlier rules, model families, covariates, and confirmatory contrasts are specified in advance, cherry-picking becomes less feasible because analytical decisions are made independently of the data. Preregistration also addresses structural issues of study quality. Declaring sample-size requirements upfront helps prevent underpowered designs at construction [@kuhbergerPublicationBiasPsychology2014; @grossmannOpenScienceReform2021]. We predefine theory, measures, and analyses, seek early input, and document choices so reviewers can vet them and avoid misinterpretation-strengthening credibility [ @evansImprovingEvidencebasedPractice2023; @sarafoglouSurveyHowPreregistration2022; @scogginsMeasuringTransparencySocial2024]. Preregistration helps separate confirmatory from exploratory work, reduces publication bias (e.g., via Registered Reports), and narrows researcher degrees of freedom [@simmonsFalsePositivePsychologyUndisclosed2011].
 For this work, **preregistration** is defined as *the act of planning and documenting the hypotheses, study design, and analysis plan of a study before data is collected or even viewed. The documentation is typically time-stamped and made publicly available*.
@@ -125,13 +125,13 @@ In summary, preregistration does not constrain scientific creativity, it clarifi
 OA publishing offers several benefits. It increases accessibility and equity, as anyone with an internet connection can reach an OA article, potentially reducing inequalities for those at underfunded institutions [@banksAnswers18Questions2019]. There is a significant OA citation advantage, as OA articles are cited more frequently than closed-access publications. This preference is now considered a form of research bias known as "FUTON" (full text on the net) bias [@piwowarStateOALargescale2018, @wentzVisibilityResearchFUTON2002; @piwowarSharingDetailedResearch2007]. OA also improves research quality by reducing the suppression of null findings [@francoPublicationBiasSocial2014] and enabling large-scale text and data mining [@tennantAcademicEconomicSocietal2016]. Furthermore, it accelerates equitable access, helping to bridge the global North-South divide, and enhances public accountability for publicly funded research [@tennantAcademicEconomicSocietal2016].
-Despite its benefits, OA faces challenges. Some newer or smaller Gold OA journals are perceived as less prestigious [@piwowarStateOALargescale2018], and concerns about "predatory publishers" have been mistakenly linked with OA [@tennantAcademicEconomicSocietal2016]. Article processing charges (APCs) can be a barrier for authors, particularly in low- and middle-income countries [@banksAnswers18Questions2019; @breznauDoesSociologyNeed2021], though roughly 70% of peer-reviewed OA journals are fee-free, and many offer waivers [@tennantAcademicEconomicSocietal2016; @breznauDoesSociologyNeed2021]. Publishers may also be hesitant to adopt OA due to concerns about losing subscription revenue. While OA promotes transparency, it cannot on its own solve issues like QRPs or underpowered studies if incentives continue to reward quantity over quality [@grossmannOpenScienceReform2021; @banksAnswers18Questions2019].
+Despite its benefits, OA faces challenges. Some newer or smaller Gold OA journals are perceived as less prestigious [@piwowarStateOALargescale2018], and concerns about "predatory publishers" have been mistakenly linked with OA [@tennantAcademicEconomicSocietal2016]. Article processing charges (APCs) can be a barrier for authors, particularly in low- and middle-income countries [@banksAnswers18Questions2019; @breznauDoesSociologyNeed2021], though roughly 70% of peer-reviewed OA journals are fee-free, and many offer waivers [@tennantAcademicEconomicSocietal2016; @breznauDoesSociologyNeed2021]. Publishers may also be hesitant to adopt OA due to concerns about losing subscription revenue. While OA promotes transparency, it cannot solve issues like QRPs or underpowered studies on its own, if incentives continue to reward quantity over quality [@grossmannOpenScienceReform2021; @banksAnswers18Questions2019].
 ## Open Science in Criminology and Legal Psychology {#sec-osp-in-crim}
 A focused literature review on adoption produced limited evidence as we still know surprisingly little about how often OSPs are actually used in criminology and legal psychology. The evidence is fragmented, method-dependent, and sometimes contradictory - so estimates of prevalence are shaky, even as enthusiasm for OSPs is high and QRPs  might appear common.
-Self-reports suggest high OSP familiarity - but they co-exist with widespread QRPs and are vulnerable to bias. In @chinQuestionableResearchPractices2023, 89% of respondents said they had used at least one OSP, yet 87% also admitted at least one QRP, and some serious QRPs (e.g., hiding known problems) were non-trivial. Survey data indicate that about 25% of researchers across fields have preregistered a study, with higher uptake in psychology (50-60%) and lower prevalence in sociology (~30%) [@fergusonSurveyOpenScience2023a]. Another survey in the field similarly estimated preregistration use at 45% (42-49%) [@chinQuestionableResearchPractices2023]. The reported prevalence of OD varies widely across disciplines. Survey data suggest that more than 60% of researchers report having posted data or code, with higher rates in psychology (>50%) compared to sociology (~35%) [@fergusonSurveyOpenScience2023a]. The prevalence of OM sharing is more limited compared to OD and access. Survey results indicate that 43% (40-47%) of researchers report providing access to their research materials [@chinQuestionableResearchPractices2023]. Few or no journals require data sharing in the field, coupled with rare preregistration and a tiny share of replication studies [@pridemoreReplicationCriminologySocial2018].
+Self-reports suggest high OSP familiarity - but they co-exist with widespread QRPs and are vulnerable to bias. In the study conducted by @chinQuestionableResearchPractices2023, 89% of respondents said they had used at least one OSP, yet 87% also admitted at least one QRP, and some serious QRPs (e.g., hiding known problems) were non-trivial. Survey data indicate that about 25% of researchers across fields have preregistered a study, with higher uptake in psychology (50-60%) and lower prevalence in sociology (~30%) [@fergusonSurveyOpenScience2023a]. Another survey in the field similarly estimated preregistration use at 45% (42-49%) [@chinQuestionableResearchPractices2023]. The reported prevalence of OD varies widely across disciplines. Survey data suggest that more than 60% of researchers report having posted data or code, with higher rates in psychology (>50%) compared to sociology (~35%) [@fergusonSurveyOpenScience2023a]. The prevalence of OM sharing is more limited compared to OD and access. Survey results indicate that 43% (40-47%) of researchers report providing access to their research materials [@chinQuestionableResearchPractices2023]. Few or no journals require data sharing in the field, coupled with rare preregistration and a tiny share of replication studies [@pridemoreReplicationCriminologySocial2018].
 In their survey at the Netherlands Institute for the Study of Crime and Law Enforcement, @moneva2025attitudes find broadly positive attitudes but divergent views by method and career stage, and a long list of cultural, structural, legal/privacy, and cost barriers. @fessingerStateOpenScience2025 also shows strong approval (88% positive) and some experience (58% tried at least one OSP), but routine adoption looks limited (only 44% even hold a repository account). In contrast, an assessment of social science studies between 2014 and 2017 found no preregistered studies at all [@hardwickeEmpiricalAssessmentTransparency2020]. 
@@ -143,7 +143,7 @@ The applied nature of the research in this field means fragile findings can driv
 # Data and Method
-The aim of this work is to compile a sample of publications in the fields of criminology and legal psychology, classify it as either statistical inference (SI) publications or non-SI publications and further examine the former to assess whether any of the OSPs under consideration are used: preregistration, OD, OM, or OA. OA results are reported as secondary, descriptive analyses to benchmark open-science adoption. The presented OSPs will be operationalized and a text-classification pipeline (keyword dictionaries and machine-learning models) will be used to detect them. OA status will be determined using publicly available metadata, given the expected high reliability of information on OA. The fine-tuned models are validated against a hand-coded sample that is extended using a large-language-model (LLM, ChatGPT 4o & ChatGPT 5o), with the product of both being then used to train classifier models that will classifiy the analytical sample to estimate true prevalences of OSPs.
+The aim of this work is to compile a sample of publications in the fields of criminology and legal psychology, classify it as either statistical inference (SI) publications or non-SI publications and further examine the former to assess whether any of the OSPs under consideration are used: preregistration, OD, OM, or OA. OA results are reported as secondary, descriptive analyses to benchmark open-science adoption. The presented OSPs will be operationalized and a text-classification pipeline (keyword dictionaries and machine-learning models) will be used to detect them. OA status will be determined using publicly available metadata, given the expected high reliability of information on OA. The fine-tuned models are validated against a hand-coded sample that is extended using a large-language-model (LLM, ChatGPT 4o & ChatGPT 5o), with the product of both being then used to train classifier models that will classifiy the analytical sample to estimate true prevalences of OSPs. The full research process is illustrated in @fig-flowchart-pipeline.
 Full-text data for training the machine learning classification models will be collected with a web application developed specifically for this project. Since software development is not the focus of this work, details of the app's architecture will not be discussed here. A brief description of the application, along with screenshots, is provided in the supplementary material.
@@ -165,6 +165,85 @@ The sampling procedure involved drawing a large enough sample for the training u
 The sample size was determined by a precision-based calculation to ensure a $\pm$ 1.5 percentage point confidence interval for the SI prevalence as a precision-based sample size calculation was deemed more suitable for an exploratory prevalence study [@blandTyrannyPowerThere2009]. Calculations were based on prevalences arbitrarily estimated using the results of the literature review described in @sec-osp-in-crim, explained further in the provided supplements. A minimum calculated total sample size equaled $\approx$ 4265 publications to achieve a 95% confidence interval with a half-width of $\pm$ 1.5 pp using the @agrestiApproximateBetterExact1998 method.
 ```{mermaid}
 %%| label: fig-flowchart-pipeline
 %%| fig-cap: "A flowchart of the full research process. All steps described are further explained in the methodologic report and the supplements."
 %%| fig-height: 8
 %%{init: {'theme': 'neutral'}}%%
 flowchart TD
    A["Population
    Top 100 JIF journals 2013-2023"]
    B["Crossref 
 Metadata filtering
 95,042 → 40,860 publications
 Deduplication, date filter, keyword exclusions"]
    C["Precision-based stratified sampling
 Target ±1.5 pp · n ≈ 4,265"]
    D["Sample A
 n = 408 · unstratified
 SI classifier training only"]
    E["Sample B
 analytical sample
 n = 4,265 · stratified by year"]
    F["Manual + LLM labelling
 Subset of Sample A
 κ ≈ .83 after reconciliation"]
    G["SI classifier trained
 Random Forest / XGBoost
 TF-IDF keyword features"]
    H1["Full-text retrieval
 HTML / PDF · custom web app"]
    H2["Full-text retrieval
 HTML / PDF · custom web app"]
    I["SI classifier applied to Sample B"]
    I2["SI papers identification
 Fine-tuned SI classifier
 n = 1,763 with usable full text"]
    OA["OA classified from metadata
 using Crossref, Web of Science, Scopus
 Covers full Sample B"]
    J["OSP training subset
 n = 352 · drawn from SI papers in Sample B
 Balanced by year
 manual & LLM labelling"]
    KOD["OD classifier
 RF / XGBoost · grid-tuned"]
    KOM["OM classifier
 RF / XGBoost · grid-tuned"]
    KPR["Preregistration classifier
 RF / XGBoost · grid-tuned"]
    L["Design-based prevalence estimates, Post-stratified by year, adjusted for misclassification"]
 A --> B
 B -- Training/Testing Sample --> D
 B -- Analytical Sample --> C
 D --> H1 --> F --> G
 C --> E --> H2 --> I
 G --> I
 E -- OA from metadata --> OA
 I --> I2 --> J
 J --> KOD & KOM & KPR
 KOD & KOM & KPR -- Applied to all SI papers --> L
 OA --> L
 J --> L
 ```
 First, Sample A, a random sample of up around 500 publications was manually classified to train the initial SI classifier. This step also helped estimate the effort for subsequent tasks. Next, an independent Sample B was drawn, stratified by year, thereby addressing problems in cross-validation and the non-independence of residuals assumptions of many machine-learning models [@robertsCrossvalidationStrategiesData2017]. 
 The SI classifier was then used to analyze and classify all publications in Sample B. From the identified SI papers in Sample B, a balanced dataset was randomly sampled to create a training set for the OSP classifiers. Finally, these trained OSP classifiers were applied to the entire analytical Sample B. While a publisher or journal-based stratification for the full sample would have been ideal, it was not feasible due to the limited number of available full texts.
@@ -444,7 +523,7 @@ Downloading the analytical sample was mostly successful, though some publisher p
 ## Classification Tasks and Methods
-This section will present a brief summary of all methods used to classify the variables of interest. A thorough discussion of the decisions taken, the full descriptions and specifications of the models used as well as the preprocessing steps can be found in the supplied materials.
+This section will present a brief summary of all methods used to classify the variables of interest. A thorough discussion of the decisions taken, the full descriptions and specifications of the models used as well as the preprocessing steps can be found in the supplied materials. A thorough discussion of all decisions taken, the full descriptions and specifications of the models used as well as the preprocessing steps can be found in the replication materials available in the osf repository.
 Since most existing classification approaches considered were deemed unsuitable for this scope (e.g., @kimResearchPaperClassification2019; @sanguansatFeatureMatricizationDocument2012; @jandotInteractiveSemanticFeaturing2016), this work instead relies on Random-Forest and XGBoost-models trained on a manually and LLM coded subset of publications as LLMs have shown good performance on similar classification tasks [@buntValidatingUseLarge2025; @zhaoAdvancingSingleMultitask2024]. 
@@ -1160,7 +1239,7 @@ Materials, Data and Code are made available at a public OSF-repository that can
 - https://osf.io/rvpc3/overview?view_only=0307dc0d99f74b50a738720a4a757aa0. 
-Further instructions can be found in the README files. Full-text data and the downloader can't be made available to the public due to copyright concerns.
+Further instructions can be found in the README files. Full-text data and the downloader can't be made available to the public due to copyright concerns. The labelled dataset containing metadata and OSP labels for the sample is available in the OSF repository. The code for the downloader is currently under revision and will be made available in the OSF repository as well.
 # Funding {.unnumbered}