working ~9.250 words

2025-12-13 14:19:01 +01:00
parent 28739daea1
commit 9b80d36466
20 changed files with 8 additions and 7 deletions
@@ -167,6 +167,9 @@ if (isTRUE(debug_mode)) {

 An overestimation the prevalence of each OSP in the population can lead to potential problems with all following steps. The true prevalences and confidence intervals along with performance diagnostics of trained models were assessed after all classification tasks were processed. An estimation of the prevalences per year was not suitable as no detailed information about those proportions was available. Instead, the established approach to stratify the sample proportionally to the population was used [@larsenProportionalAllocationStrata2008]. 

+# Full Text Retreival
+
+As mentioned in the manuscript, full texts were retreived using a self developed web application that used both web scraping and publisher API's. Legal aspects were carefully considered throughout the development. Within the EU, scraping is legal for scientific purposes [@urhg-60d-tdm], but institutional contracts can override this. Scraping was therefore limited to the university network and only to publishers that permit it while other publishers were scraped outside of the network. Technical details are available in the documents provided while the scraper might be made publicly available in the future. 

 # Model Training