closes #4; refined data collection

This commit is contained in:
Michael Beck 2024-12-18 18:28:11 +01:00
parent 95f1683326
commit 22989cf064
5 changed files with 1947 additions and 5 deletions

View File

@ -107,7 +107,7 @@ The study will focus on papers in criminal psychology that use data and statisti
## Data Collection
The process of data collection will closely follow @scogginsMeasuringTransparencySocial2024 and begin with identifying relevant journals in criminal psychology. I will consult the Clarivate Journal Citation Report via their API to obtain a comprehensive list of journals within these fields by filtering for the top 30 journals in the respective fields (originally, @scogginsMeasuringTransparencySocial2024 used a top 100 filter - I will use top 30 journals to limit the amount of data because of technical limitations in my workspace setup). To ensure feasibility, I will filter this list to include only journals that are accessible under the universitys licensing agreements. Once the relevant journals are identified, I will use APIs such as Crossref, Scopus, or Web of Science to download metadata for all papers published between 2013 to 2023.
The process of data collection will closely follow @scogginsMeasuringTransparencySocial2024 and begin with identifying relevant journals in criminal psychology. I will consult the Clarivate Journal Citation Report to obtain a comprehensive list of journals within the fields by filtering for the top 100 journals. The Transparency-and-Openness-Promotion-Factor[^4] (TOP-Factor) according to @nosekPromotingOpenResearch2015 will be used to then assess the journal's admission of open science practices and by including it in the journal dataset. Once the relevant journals are identified, I will use APIs such as Crossref, Scopus, and Web of Science to download metadata for all papers published between 2013 to 2023.
After obtaining the metadata, I will proceed to download the full-text versions of the identified papers. Whenever possible, I will prioritize downloading HTML versions of the papers due to their structured format, which simplifies subsequent text extraction. For papers that are not available in HTML, I will consider downloading full-text PDFs. Tools such as PyPaperBot or others[^1] can facilitate this process, although I will strictly stick to ethical and legal guidelines, avoiding unauthorized sources like Sci-Hub or Anna's Archive and only using sources that are either included in my institutions campus license or available via open access. If access to full-text papers becomes a limiting factor, I will assess alternative strategies such as collaborating with institutional libraries to request specific papers or identifying open-access repositories that may provide supplementary resources. Non-available texts will be considered with their own category in the later analysis. Once all available full-text papers are collected, I will preprocess the data by converting HTML and PDF files into plain text format using tools such as SciPDF Parser or others[^2]. This preprocessing step ensures that the text is in a standardized format suitable for analysis.
@ -119,6 +119,8 @@ The proposed data collection is resource-intensive but serves multiple purposes.
[^3]: DDoS: Distributed Denial of Service, see @wangDDoSAttackProtection2015.
[^4]: The TOP-Factor according to @nosekRegisteredReports2014 is a score that assesses the admission of open science practices can be obtained from [topfactor.org](https://topfactor.org/journals).
## Classification
The classification process will begin with operationalizing the key open science practices that I aim to study. This involves the definition of clear criteria for identifying papers that fall into the categories I plan to classify: Papers that use statistical inference, papers that applied preregistration, papers that applied open data practices, papers that offer open materials and papers that are available via open access.

15
lit.bib
View File

@ -7232,6 +7232,21 @@
langid = {english}
}
@article{nosekPromotingOpenResearch2015,
title = {Promoting an Open Research Culture},
author = {Nosek, B. A. and Alter, G. and Banks, G. C. and Borsboom, D. and Bowman, S. D. and Breckler, S. J. and Buck, S. and Chambers, C. D. and Chin, G. and Christensen, G. and Contestabile, M. and Dafoe, A. and Eich, E. and Freese, J. and Glennerster, R. and Goroff, D. and Green, D. P. and Hesse, B. and Humphreys, M. and Ishiyama, J. and Karlan, D. and Kraut, A. and Lupia, A. and Mabry, P. and Madon, T. and Malhotra, N. and {Mayo-Wilson}, E. and McNutt, M. and Miguel, E. and Paluck, E. Levy and Simonsohn, U. and Soderberg, C. and Spellman, B. A. and Turitto, J. and VandenBos, G. and Vazire, S. and Wagenmakers, E. J. and Wilson, R. and Yarkoni, T.},
year = {2015},
month = jun,
journal = {Science},
volume = {348},
number = {6242},
pages = {1422--1425},
publisher = {American Association for the Advancement of Science},
doi = {10.1126/science.aab2374},
urldate = {2024-12-18},
file = {/home/michi/Zotero/storage/A32SAIJU/Nosek et al. - 2015 - Promoting an open research culture.pdf}
}
@article{nosekRegisteredReports2014,
title = {Registered {{Reports}}},
author = {Nosek, Brian A. and Lakens, Dani{\"e}l},

View File

@ -9,10 +9,10 @@ OUT="${FILENAME}.pdf"
echo "Generating PDF..."
pandoc -i "$IN" \
-o "$OUT" \
--csl=apa-7th-edition.csl \
--csl=resources/apa-7th-edition.csl \
--citeproc \
--lua-filter=filters/first-line-indent.lua \
--citation-abbreviations=citation-abbreviations.csl
--citation-abbreviations=resources/citation-abbreviations.csl
# Check if pandoc ran successfully
if [ $? -ne 0 ]; then
@ -20,9 +20,9 @@ if [ $? -ne 0 ]; then
exit 1
fi
Insert Erklärung.pdf at the end of the PDF
# Insert Erklärung.pdf at the end of the PDF
echo "Modifying the PDF..."
./modify-pdf.sh "$OUT" "Erklärung.pdf" "$OUT"
./modify-pdf.sh "$OUT" "resources/Erklärung.pdf" "$OUT"
# remove last page for osf.io
echo "Removing last page for OSF.io output and saving to OSF-$OUT"

File diff suppressed because it is too large Load Diff

View File

@ -0,0 +1,8 @@
{ "default": {
"container-title": {
"European Social Survey European Research Infrastructure": "ESS ERIC",
"Bundeskriminalamt": "BKA",
"Scots Law Times": "SLT"
}
}
}