From f88ff734bdc8039b5f1e48025785c52780274ba2 Mon Sep 17 00:00:00 2001 From: Michael Beck Date: Mon, 16 Dec 2024 23:56:45 +0100 Subject: [PATCH] added tons of stuff, literature, corrected the makefile, added ResearchPlan, Notes and corrected readme --- Notes.md | 147 +++++++ ResearchPlan.md | 93 ++++ ResearchProposal.md | 140 +----- lit.bib | 1004 +++++++++++++++++++++++++++++++++++++++++-- make.sh | 1 - modify-pdf.sh | 0 6 files changed, 1226 insertions(+), 159 deletions(-) create mode 100644 Notes.md create mode 100644 ResearchPlan.md mode change 100644 => 100755 modify-pdf.sh diff --git a/Notes.md b/Notes.md new file mode 100644 index 0000000..492ded9 --- /dev/null +++ b/Notes.md @@ -0,0 +1,147 @@ +# Notes + +## Research Plan + +- **Problem** using both sociology and criminology can introduce bias of the trained models due to highly different vocabulary used in both discioplines + +According to @scogginsMeasuringTransparencySocial2024a +**Population**: \[social science\] papers using data and statistics +1. **Gathering Papers** + 1. Consult Clarivate Journal Citation Report to obtain Journals in the field + 2. Filter downloadable journals (that are included in the campus' licences) + 3. Using [Crossref](https://github.com/ropensci/rcrossref), [Scopus](https://github.com/muschellij2/rscopus) or [WOS](https://github.com/juba/rwos) API: download publication metadata of all papers in a respective time span + 4. Download HTML Papers + 5. Filter to-download list by grabbed html papers + 6. Download Paper Fulltext PDF: using [ferru97/PyPaperBot](https://github.com/ferru97/PyPaperBot), [ monk1337/resp](https://github.com/monk1337/resp) (even possible to use anna's archive, scihub or libgen, but this would be illegal so no ofc not) - **really necessary?** + 7. Convert HTML and PDF papers to txt ([titipata/scipdf\_parser](https://github.com/titipata/scipdf_parser), [aaronsw/html2text](https://github.com/aaronsw/html2text), [html2text · PyPI](https://pypi.org/project/html2text/)) +2. Classification + 1. Operationalization of ... + 1. Papers that use statistical inference + 2. Papers that applied preregistration + 3. Papers that applied open data practices + 4. Papers that offer open materials + 5. Open Access (theoretically not interesting?) + 6. Papers with Positive Results + 2. Definition of Identification keywords/dictionaries for each category + 3. Manual classification of a number of papers for ml model training (between 1k/2k) + 4. Creation of [DFMs](https://quanteda.io/reference/dfm.html) (see also: [Official Tutorial](https://tutorials.quanteda.io/basic-operations/dfm/)) using the dictionaries + 5. MLM training (Naive Bayes, LogReg, Nonlinear SVM, Random Forest, XGB) + 6. MLM evaluation / decision + 7. Classification of data using the trained, best performing model +3. Analysis + - descriptive analysis of the temporal development in proportions in the last 10 years in each discipline, see @scogginsMeasuringTransparencySocial2024a + + +## Todo + +- add stuff about the replication crisis, 1-2 sentences in the introduction. see @scogginsMeasuringTransparencySocial2024a +- **improve wording in the last paragraph** + +## Open Access + +[@dienlinAgendaOpenScience2021] + 1. publish materials, data and code + 2. preregister studies and submit registered reports + 3. conduct replication studies + 4. collaborate + 5. foster open science skills + 6. Implement Transparency and Openness Promotion (TOP) Guidelines + 7. incentivize open science practices + + +- Systemic Biases in AI and Big Data: Open science tools can be used to address biases in AI algorithms [@nororiAddressingBiasBig2021]. + + + +### **Publication Bias**, selective reporting [@smaldinoOpenScienceModified2019; @fox142OpenScience2021]. +- **Problem:** Journals often favor publishing positive or statistically significant results, leaving negative or null findings unpublished. +- **How Open Science Helps:** Pre-registration of studies and publishing all research outcomes (e.g., via open access repositories) ensures that all results are accessible. Open science encourages the publication of all results, including negative or null findings, which helps reduce the bias towards publishing only positive results. By promoting transparency and the sharing of data and methodologies, open science reduces the tendency to selectively report only favorable outcomes + +### **Confirmation Bias** [@fox142OpenScience2021] +- **Problem:** +- **How Open Science Helps:** Open science practices, such as pre-registration of studies, help mitigate confirmation bias by specifying hypotheses and analysis plans before data collection + +### **Reproducibility Crisis** [@fox142OpenScience2021] +- **Problem:** Many scientific findings cannot be replicated due to opaque methodologies or unavailable data and code. +- **How Open Science Helps:** Sharing detailed methods, datasets, and analysis scripts in open repositories promotes reproducibility and verification. Open science addresses the reproducibility crisis by making data and methods openly available, allowing other researchers to verify and replicate findings. + +### **Algorithmic Bias** [@nororiAddressingBiasBig2021] +- **Problem:** +- **How Open Science Helps:** public data and training reports for ai enable + +### **Inefficiencies in Research Progress** +- **Problem:** Duplication of efforts and siloed research slow down scientific advancements. +- **How Open Science Helps:** Sharing negative results, datasets, and ongoing projects prevents duplication and accelerates innovation. + +### **Overemphasis on Novelty** +- **Problem:** The pressure to publish novel findings discourages replication studies or incremental advancements. +- **How Open Science Helps:** Encouraging and funding replication studies through open peer-review processes shifts focus towards reliable and cumulative science. + + + +### **Lack of Peer Review Transparency** +- **Problem:** Traditional peer review is often anonymous and lacks accountability, leading to potential biases or unfair evaluations. +- **How Open Science Helps:** Open peer review, where reviews and reviewer identities are accessible, ensures greater accountability and reduces bias. + +### **Authorship and Credit Bias** +- **Problem:** Early-career researchers, women, and underrepresen + - Logistic Regression + - Support Vector Machines + - Random Forests + - Gradient Boosted Trees + - Evaluate model performance to select the best classifier for each open science practice. + +6. **Automated Classification** + - Apply the best-performing models to classify the entire dataset. + - Automate the identification of open science practices across all collected papers. + +## **Analysis** +1. **Descriptive Analysis** + - Examine temporal trends in the adoption of open science practices over the past decade. + - Compare practices across sociology and criminology. + +2. **Evaluation of Results** + - Identify patterns in: + - Prevalence of pre-registration, open data, open materials, and open access. + - Statistical inference methods. + - Reporting of positive results. + +3. **Ethical Considerations** + - Ensure all methodologies comply with ethical and legal guidelines. + - Avoid unauthorized sources such as Sci-Hub or LibGen. + +4. **Broader Implications** + - Contribute to understanding the adoption of transparency and reproducibility in social sciences. + - Inform efforts to promote open science practices in sociology, criminology, and beyond.tors are recognized for their specific roles. + +### **Conflicts of Interest** +- **Problem:** Undisclosed funding sources or affiliations may bias research findings. +- **How Open Science Helps:** Transparent declarations of conflicts of interest and funding sources reduce hidden biases. + +### **Limited Interdisciplinary Collaboration** +- **Problem:** Barriers to sharing research outputs restrict interdisciplinary collaboration, limiting innovation. +- **How Open Science Helps:** Open sharing of data, methods, and publications fosters cross-disciplinary integration and innovation. + + + +### **Data Access Inequality** +- **Problem:** Researchers in low-resource settings often lack access to expensive journals, datasets, or tools. +- **How Open Science Helps:** Open access publications and open data initiatives democratize access to research outputs, enabling equitable participation in science. + +### **Misuse of Metrics (e.g., Impact Factor, h-Index)** +- **Problem:** Reliance on quantitative metrics for evaluating research quality skews scientific priorities. +- **How Open Science Helps:** Encouraging diverse evaluation metrics (e.g., open data reuse, societal impact) ensures fair assessment of research contributions. + +### **Cherry-Picking and P-Hacking** +- **Problem:** Selective reporting or manipulating data to achieve statistical significance undermines the integrity of research. +- **How Open Science Helps:** Pre-registration of hypotheses and protocols discourages cherry-picking and promotes adherence to predefined analysis plans. + + + +### **Lack of Public Engagement** +- **Problem:** Complex scientific outputs are often inaccessible to the general public, leading to mistrust or misunderstanding of science. +- **How Open Science Helps:** Open access and lay summaries of research make science more inclusive and comprehensible to non-specialists. + +This commitment is rooted in the idea that scientific claims must be substantiated through consistent and reproducible evidence. Modern scientific inquiry, therefore, aligns with the notion that: + +> "Only by ... repetitions can we convince ourselves that we are not dealing with a mere isolated ‘coincidence’, but with events which, on account of their regularity and reproducibility, are in principle inter-subjectively testable." [@popperLogicScientificDiscovery2005, p. 23] diff --git a/ResearchPlan.md b/ResearchPlan.md new file mode 100644 index 0000000..592cc0f --- /dev/null +++ b/ResearchPlan.md @@ -0,0 +1,93 @@ +# Data, Method, and Analysis of Open Science Practices in Sociology and Criminology Papers + +## **Population** +- Papers in sociology and criminology utilizing data and statistical methods. +- Focus on evaluating open science practices: + - Pre-registration + - Open data + - Open materials + - Open access + - Statistical inference + +## **Data Collection** +1. **Journal Identification** + - Use Clarivate Journal Citation Report API to obtain a comprehensive list of sociology and criminology journals. + - Filter the list to include journals accessible through university licensing agreements. + +2. **Metadata Download** + - Utilize APIs such as CrUsing [Crossref](https://github.com/ropensci/rcrossref), [Scopus](https://github.com/muschellij2/rscopus) or [WOS](https://github.com/juba/rwos) to download metadata for all papers published between 2013–2023. + +3. **Full-Text Retrieval** + - Download HTML versions of papers where available for ease of structured text extraction + - Use full-text PDFs when HTML is not available, adhering strictly to ethical and legal guidelines. + - Tools for retrieval: + - [ferru97/PyPaperBot](https://github.com/ferru97/PyPaperBot), [ monk1337/resp](https://github.com/monk1337/resp) (licensed sources only). + - Institutional library services for access. + - Open-access repositories for additional resources. + - ([titipata/scipdf\_parser](https://github.com/titipata/scipdf_parser), [aaronsw/html2text](https://github.com/aaronsw/html2text), [html2text · PyPI](https://pypi.org/project/html2text/)) + +4. **Preprocessing** + - Convert collected papers to plain text using: + - SciPDF Parser for PDF-to-text conversion. + - HTML-to-text tools like `html2text`. + - Standardize text format for subsequent analysis. + +5. **Resource Management** + - Address potential constraints: + - Use scalable data collection methods. + - Leverage institutional resources (e.g., libraries and repositories). + - Implement efficient workflows for text extraction and preprocessing (multicore processing). + +## **Classification** +1. **Operationalization** + - Define clear criteria for identifying open science practices: + - Pre-registration: Terms like "pre-registered." + - Open data: Phrases like "data availability statement." + - Open materials: Statements like "materials available on request." + +2. **Keyword Dictionary Creation** + - Develop dictionaries of terms and phrases associated with each open science practice. + - Base dictionaries on prior research (e.g., @scogginsMeasuringTransparencySocial2024a). + - Compare and join dictionaries. + +3. **Manual Annotation** + - Manually classify a subset of 1,000–2,000 papers for training machine learning models. + - Use stratified sampling to ensure diversity in: + - Journals + - Publication years + - Subfields within sociology and criminology. + +4. **Feature Extraction** + - Create document-feature matrices (DFMs) using keyword dictionaries to prepare data for machine learning. + +5. **Model Training** + - Train multiple machine learning models: + - Naive Bayes + - Logistic Regression + - Support Vector Machines + - Random Forests + - Gradient Boosted Trees + - Evaluate model performance to select the best classifier for each open science practice. + +6. **Automated Classification** + - Apply the best-performing models to classify the entire dataset. + - Automate the identification of open science practices across all collected papers. + +## **Analysis** +1. **Descriptive Analysis** + - Examine temporal trends in the adoption of open science practices over the past decade. + - Compare practices across sociology and criminology. + - Compare journals + +2. **Evaluation of Results** + - Identify patterns in: + - Prevalence of pre-registration, open data, open materials, and open access. + - Statistical inference methods. + +3. **Ethical Considerations** + - Ensure all methodologies comply with ethical and legal guidelines. + - Avoid unauthorized sources such as Sci-Hub or LibGen. + +4. **Broader Implications** + - Contribute to understanding the adoption of transparency and reproducibility in social sciences. + - Inform efforts to promote open science practices in sociology, criminology, and beyond. \ No newline at end of file diff --git a/ResearchProposal.md b/ResearchProposal.md index c7a81b1..558fb69 100644 --- a/ResearchProposal.md +++ b/ResearchProposal.md @@ -76,154 +76,54 @@ include-before: | ## Modern Science -The rise of the internet in the last decades drastically changed our lives: Our ways of looking at the world, our social lives or our consumption patterns - the internet influences all spheres of life, whether we like it or not [@SocietyInternetHow2019]. The surge interconnectivity enabled a rise in movements that resist the classic definition of intellectual property rights: open source, open scholarship access and open science [@willinskyUnacknowledgedConvergenceOpen2005]. Modern technologies enhanced reliability, speed and efficiency in knowledge development, thereby enhancing communication, collaboration and access to information or data [@thagardInternetEpistemologyContributions1997; @eisendInternetNewMedium2002; @wardenInternetScienceCommunication2010]. The internet significantly facilitated formal and informal scholarly communication through electronic journals and digital repositories like Academia.edu or ResearchGate [@wardenInternetScienceCommunication2010; @waiteINTERNETKNOWLEDGEEXCHANGE2021]. Evidence also schows that an increase in access to the internet also increases research output [@xuImpactInternetAccess2021]. But greater output doesn't necessarily imply greater quality, progress or greater scientific discoveries. As availability and thereby the quantity of publications increased, the possible information overload demands for effective filtering and assessement of publicated results [@wardenInternetScienceCommunication2010]. +The rise of the internet in the last decades drastically changed our lives: Our ways of looking at the world, our social lives or our consumption patterns - the internet influences all spheres of life, whether we like it or not [@SocietyInternetHow2019]. The surge interconnectivity enabled a rise in movements that resist the classic definition of intellectual property rights: open source, open scholarship access and open science [@willinskyUnacknowledgedConvergenceOpen2005]. Modern technologies enhanced reliability, speed and efficiency in knowledge development, thereby enhancing communication, collaboration and access to information or data [@thagardInternetEpistemologyContributions1997; @eisendInternetNewMedium2002; @wardenInternetScienceCommunication2010]. The internet significantly facilitated formal and informal scholarly communication through electronic journals and digital repositories like Academia.edu or ResearchGate [@wardenInternetScienceCommunication2010; @waiteINTERNETKNOWLEDGEEXCHANGE2021]. Evidence also schows that an increase in access to the internet also increases research output [@xuImpactInternetAccess2021]. But greater output doesn't necessarily imply greater quality, progress or greater scientific discoveries. As availability and thereby the quantity of publications increased, the possible information overload demands for effective filtering and assessment of publicated results [@wardenInternetScienceCommunication2010]. But how do we define scientific progress? In the mid 20st century, Thomas Kuhn characterized scientific progress as a revolutionary shift in paradigms that are accepted theories in a scientific community at a given time. According to Kuhn, normal science operates within these paradigms, "solving puzzles" and refining theories. However, when anomalies arise that cannot be explained by the current paradigm, a crisis occurs, leading to a scientific revolution [@kuhnReflectionsMyCritics1970; @kuhnStructureScientificRevolutions1962]. Opposed to that, a critical rationalist approach to scientific progress emerged that saw danger in the by Kuhn described process as paradigms might facilitate confirmation bias and thereby stall progress. Karl Popper's philosophy of science, which emphasizes falsifiability and the idea that scientific theories progress through conjectures and refutations rather than through paradigm shifts. Popper argued that science advances by eliminating false theories, thus moving closer to the truth in a more linear and cumulative manner [@popperLogicScientificDiscovery2005]. Where Kuhn emphasized the development of dominant theories, Popper suggested the challenging or falsification of those theories. -Social sciences today engage in frequentist, deductive reasoning where significance testing is used to evaluate the null hypothesis, and conclusions are drawn based on the rejection or acceptance of this hypothesis, aligning with Popper's idea that scientific theories should be open to refutation. This approach is often criticized for its limitations in interpreting p-values and its reliance on long-run frequency interpretations [@UseMisuseClassicala; @wilkinsonTestingNullHypothesis2013]. In contrast, Bayesian inference is associated with inductive reasoning, where models are updated with new data to improve predictions. Bayesian methods allow for the comparison of competing models using tools like Bayes factors, but they do not directly falsify models through significance tests [@gelmanInductionDeductionBaysian2011; @dollBayesianModelSelection2019]. Overall, while falsification remains a cornerstone of scientific methodology, contemporary science often employs a pluralistic approach, integrating various methods to address complex questions and advance knowledge [@rowbottomKuhnVsPopper2011]. This pluralistic approach in contemporary science underscores the importance of integrating diverse methodologies to tackle complex questions and enhance our understanding. Despite the differences between frequentist and Bayesian methods, both share a fundamental commitment to the rigorous testing and validation of scientific theories. +Social sciences today engage in frequentist, deductive reasoning where significance testing is used to evaluate the null hypothesis, and conclusions are drawn based on the rejection or acceptance of this hypothesis, aligning with Popper's idea that scientific theories should be open to refutation. This approach is often criticized for its limitations in interpreting p-values and its reliance on long-run frequency interpretations [@dunleavyUseMisuseClassical2021; @wilkinsonTestingNullHypothesis2013]. In contrast, Bayesian inference is associated with inductive reasoning, where models are updated with new data to improve predictions. Bayesian methods allow for the comparison of competing models using tools like Bayes factors, but they do not directly falsify models through significance tests [@gelmanInductionDeductionBaysian2011; @dollBayesianModelSelection2019]. Overall, while falsification remains a cornerstone of scientific methodology, contemporary science often employs a pluralistic approach, integrating various methods to address complex questions and advance knowledge [@rowbottomKuhnVsPopper2011]. This pluralistic approach in contemporary science underscores the importance of integrating diverse methodologies to tackle complex questions and enhance our understanding. Despite the differences between frequentist and Bayesian methods, both share a fundamental commitment to the rigorous testing and validation of scientific theories. But despite the more theoretically driven discourse about scientific discovery, there are many tangible reasons to talk about the scientific method and the publication process. A recent, highly cited article revealed that only a very small proportion in variance of the outcomes in studies based on the same data can be accounted to the choices made by researchers in designing their tests. @breznauObservingManyResearchers2022 observed 77 researcher teams analyzing the same dataset to assess the same hypothesis and found that the results ranged from strong positive to strong negative results. Between-team deviance could only be explained to less than 50% by assigned conditions, research decisions and researcher characteristics, the rest of the variance remained unexplained. This underlines the importance of transparent research: results are prone to many errors and biases, made intentionally or unintentionally by the researcher or induced by the publisher. > "Only by ... repetitions can we convince ourselves that we are not dealing with a mere isolated ‘coincidence’, but with events which, on account of their regularity and reproducibility, are in principle inter-subjectively testable." [@popperLogicScientificDiscovery2005, p. 23] To challenge the biases and to support the possibility of these "repetitions" or replications of research, a movement has formed within the scientific community, fuelled by the "replication crisis" that was especially prevalent within the field of psychology [@dienlinAgendaOpenScience2021]. The open science movement tries to establish open science practices to challenge many of the known biases that endanger the reliability of the scientific process. -@banksAnswers18Questions2019 establish a definition of open science as a broad term that refers to many concepts including scientific philosophies emobodying communality and universalism, specific practices operationalizing these norms including open science policies like sharing of data and analytic files, redifinition of confidence thresholds, pre-registration of studies and analytical plans, engagement in replication studies, removal of pay-walls, incentive systems to encourage the above practices and even specific citation standards. This typology is in line with the work of many other authors from diverse disciplines [e.g. @dienlinAgendaOpenScience2021; and @greenspanOpenSciencePractices2024]. The two dominant, highly discussed approaches in open science are open data and preregistration. +@banksAnswers18Questions2019 establish a definition of open science as a broad term that refers to many concepts including scientific philosophies embodying communality and universalism, specific practices operationalizing these norms including open science policies like sharing of data and analytic files, redifinition of confidence thresholds, pre-registration of studies and analytical plans, engagement in replication studies, removal of pay-walls, incentive systems to encourage the above practices and even specific citation standards. This typology is in line with the work of many other authors from diverse disciplines [e.g. @dienlinAgendaOpenScience2021; and @greenspanOpenSciencePractices2024]. The two dominant, highly discussed approaches in open science are open data and preregistration. **Publishing materials, data and code** or *open data* is necessary to enable replication of the studies. Replication thereby makes it possible to assess the pursued research in detail, find errors, bias or even support the results [@dienlinAgendaOpenScience2021]. While many researchers see challenges in the publication of their data and materials due to a potentially higher workload, legal concerns or just lack of interest, many of these concerns could be ruled out by streamlined processes or institutional support [@freeseAdvancesTransparencyReproducibility2022; @freeseReplicationStandardsQuantitative2007]. As open data reduces p-hacking, facilitates new research by enabling reproduction, reveals mistakes in the coding process and enables a diffusion of knowledge on the research process, it seems that many researchers, journals and other institutions start to adopt open data in their research [@dienlinAgendaOpenScience2021; @finkReplicationCodeAvailability; @freeseAdvancesTransparencyReproducibility2022; @zenk-moltgenFactorsInfluencingData2018; @matternWhyAcademicsUndershare2024]. -**Preregistration** involves thoroughly outlining and documenting research plans and their rationale in a repository. These plans can be made publicly accessible when the researcher decides to share them. The specifics of preregistration can vary based on the research type and may encompass elements such as hypotheses, sampling strategies, interview guides, exclusion criteria, study design, and analysis plans [@managoPreregistrationRegisteredReports2023]. Within this definition, a preregistration shall not prevent exploratory research. Deviations from the research plan are still allowed but shall be communicated transparently [@managoPreregistrationRegisteredReports2023; @nosekRegisteredReports2014]. Preregistration impacts research in multiple ways : it helps performing exploratory and confirmatory research independently, protects against publication bias as journals tipically commit to publish registered research and counters "researchers' degrees of freedom" in data analysis by reducing overfitting through cherry-picking, variable swapping, flexible model selection and subsampling [@mertensPreregistrationAnalysesPreexisting2019; @FalsePositivePsychologyUndisclosed]. This minimizes the risk of bias by promoting decision-making that is independent of outcomes. It also enhances transparency, allowing others to evaluate the potential for bias and adjust their confidence in the research findings accordingly [@hardwickeReducingBiasIncreasing2023]. +**Preregistration** involves thoroughly outlining and documenting research plans and their rationale in a repository. These plans can be made publicly accessible when the researcher decides to share them. The specifics of preregistration can vary based on the research type and may encompass elements such as hypotheses, sampling strategies, interview guides, exclusion criteria, study design, and analysis plans [@managoPreregistrationRegisteredReports2023]. Within this definition, a preregistration shall not prevent exploratory research. Deviations from the research plan are still allowed but shall be communicated transparently [@managoPreregistrationRegisteredReports2023; @nosekRegisteredReports2014]. Preregistration impacts research in multiple ways : it helps performing exploratory and confirmatory research independently, protects against publication bias as journals typically commit to publish registered research and counters "researchers' degrees of freedom" in data analysis by reducing overfitting through cherry-picking, variable swapping, flexible model selection and subsampling [@mertensPreregistrationAnalysesPreexisting2019; @FalsePositivePsychologyUndisclosed]. This minimizes the risk of bias by promoting decision-making that is independent of outcomes. It also enhances transparency, allowing others to evaluate the potential for bias and adjust their confidence in the research findings accordingly [@hardwickeReducingBiasIncreasing2023]. -My initial plan for my master's thesis was to study the effect of pre-registration on reported effect sizes. During my initial literature review, it appeared to me that there were very few publications that used pre-registration in data-driven criminology and sociology. Instead of assessing effect sizes, this raised the question: **How have open science practices been adapted within sociology and criminolgy? How has the use of these practices developed over the last decade?** +My initial plan for my master's thesis was to study the effect of pre-registration on reported effect sizes. During my initial literature review, it appeared to me that there were very few publications that used pre-registration in data-driven criminology and sociology. Instead of assessing effect sizes, this raised the question: **How have open science practices been adapted within sociology and criminology? How has the use of these practices developed over the last decade?** -@scogginsMeasuringTransparencySocial2024a did an extensive analysis of almost 100,000 publications in political science and international relations. They found an increasing use of preregistration and open data, with levels still being relatively low. The extensive research not only revealed the current state of open science in political science, but also generated rich data to perform further meta research. Therefore, I intend to apply similar methods in the field of sociology and criminology. In the following section I describe the intended data collection and research methods that are highly based on @scogginsMeasuringTransparencySocial2024a research. +@scogginsMeasuringTransparencySocial2024 did an extensive analysis of nearly 100,000 publications in political science and international relations. They observed an increasing use of preregistration and open data, with levels still being relatively low. The extensive research not only revealed the current state of open science in political science, but also generated rich data to perform further meta research. + +I intend to apply similar methods in the field of sociology and criminology: gather data about papers in a subset of criminology and sociology journals, classify those papers by application of open source practices and explore the patterns over time to take stock of research practices in the disciplines. In the following section I describe the intended data collection and research methods that are highly based on @scogginsMeasuringTransparencySocial2024 research. # Data and Method -- **Problem** using both sociology and criminology can introduce bias of the trained models due to highly different vocabulary used in both discioplines +The study will focus on papers in sociology and criminology that use data and statistical methods. The aim is to evaluate the prevalence of key open science practices, including pre-registration, open data, open materials, open access, statistical inference, and the reporting of positive results. -According to @scogginsMeasuringTransparencySocial2024a -**Population**: \[social science\] papers using data and statistics -1. **Gathering Papers** - 1. Consult Clarivate Journal Citation Report to obtain Journals in the field - 2. Filter downloadable journals (that are included in the campus' licences) - 3. Using [Crossref](https://github.com/ropensci/rcrossref), [Scopus](https://github.com/muschellij2/rscopus) or [WOS](https://github.com/juba/rwos) API: download publication metadata of all papers in a respective time span - 4. Download HTML Papers - 5. Filter to-download list by grabbed html papers - 6. Download Paper Fulltext PDF: using [ferru97/PyPaperBot](https://github.com/ferru97/PyPaperBot), [ monk1337/resp](https://github.com/monk1337/resp) (even possible to use anna's archive, scihub or libgen, but this would be illegal so no ofc not) - **really necessary?** - 7. Convert HTML and PDF papers to txt ([titipata/scipdf\_parser](https://github.com/titipata/scipdf_parser), [aaronsw/html2text](https://github.com/aaronsw/html2text), [html2text · PyPI](https://pypi.org/project/html2text/)) -2. Classification - 1. Operationalization of ... - 1. Papers that use statistical inference - 2. Papers that applied preregistration - 3. Papers that applied open data practices - 4. Papers that offer open materials - 5. Open Access (theoretically not interesting?) - 6. Papers with Positive Results - 2. Definition of Identification keywords/dictionaries for each category - 3. Manual classification of a number of papers for ml model training (between 1k/2k) - 4. Creation of [DFMs](https://quanteda.io/reference/dfm.html) using the dictionaries - 5. MLM training (Naive Bayes, LogReg, Nonlinear SVM, Random Forest, XGB) - 6. MLM evaluation / decision - 7. Classification of data using the trained, best performing model -3. Analysis - - One of the two: - - descriptive analysis of the temporal development in proportions in the last 10 years in each discipline, see @scogginsMeasuringTransparencySocial2024a - - Intergroup Comparison of effect sizes of a randomly drawn sample of the data gathered. Effect sizes could also be gathered using a trained +## Data Collection -Why the huge data collection effort? -- preparation for further research, database might be useful for other research questions -- I want to practice R / ML-Methods. -- By-hand collection of data on open science practices is very time consuming. why not generate data from the texts? -- From @akkerPreregistrationSecondaryData2021: "To create a control group for comparison with the preregistered studies in our sample, we linked each preregistered publication in our sample to a non-preregistered publication. We did so by checking Web of Science’s list of related papers for every preregistered publication and selecting the first non-preregistered publication from that list that used primary quantitative data and was published in the same year as the related preregistered publication." i think this is kind of questionable. +The process of data collection will closely follow @scogginsMeasuringTransparencySocial2024 and begin with identifying relevant journals in sociology and criminology. I will consult the Clarivate Journal Citation Report via their API to obtain a comprehensive list of journals within these fields by filtering for the top 30 journals in the respective fields (originally, @scogginsMeasuringTransparencySocial2024 used a top 100 filter - I will use top 30 journals to limit the amount of data because of technical limitations in my workspace setup). To ensure feasibility, I will filter this list to include only journals that are accessible under the university’s licensing agreements. Once the relevant journals are identified, I will use APIs such as Crossref, Scopus, or Web of Science to download metadata for all papers published between 2013 to 2023. -## Todo +After obtaining the metadata, I will proceed to download the full-text versions of the identified papers. Whenever possible, I will prioritize downloading HTML versions of the papers due to their structured format, which simplifies subsequent text extraction. For papers that are not available in HTML, I will consider downloading full-text PDFs. Tools such as PyPaperBot can facilitate this process, although I will strictly stick to ethical and legal guidelines, avoiding unauthorized sources like Sci-Hub or LibGen. If access to full-text papers becomes a limiting factor, I will assess alternative strategies such as collaborating with institutional libraries to request specific papers or identifying open-access repositories that may provide supplementary resources. Non-available texts will be considered with their own category in the later analysis. Once all available full-text papers are collected, I will preprocess the data by converting HTML and PDF files into plain text format using tools such as SciPDF Parser or html2text. This preprocessing step ensures that the text is in a standardized format suitable for analysis. -- add stuff about the replication crisis, 1-2 sentences in the introduction. see @scogginsMeasuringTransparencySocial2024a -- **improve wording in the last paragraph** +The proposed data collection is resource-intensive but serves multiple purposes. However, resource constraints could pose challenges, such as limited access to computational tools or delays in obtaining full-text papers. To mitigate these risks, I plan to prioritize scalable data collection methods, limit data collection to a manageable extent and use existing institutional resources, including library services and open-access repositories. Additionally, I will implement efficient preprocessing workflows ensuring that the project remains feasible within the given timeline and resources. -# Notes +## Classification -[@dienlinAgendaOpenScience2021] - 1. publish materials, data and code - 2. preregister studies and submit registered reports - 3. conduct replication studies - 4. collaborate - 5. foster open science skills - 6. Implement Transparency and Openness Promotion (TOP) Guidelines - 7. incentivize open science practices +The classification process will begin with operationalizing the key open science practices that I aim to study. This involves operationalizing clear criteria for identifying papers that fall into the four categories I plan to classify: Papers that use statistical inference. Papers that applied preregistration, Papers that applied open data practices, Papers that offer open materials. +For instance, terms like “pre-registered,” “open data,” or “data availability statement” could indicate adherence to pre-registration or open data practices. Similarly, phrases such as “materials available on request” or “open materials” could signify the use of open materials. @scogginsMeasuringTransparencySocial2024 freely available data will form the foundation of keyword dictionaries for identifying relevant papers during the classification phase. To facilitate this, I will additionally develop own keyword dictionaries for each category, identifying terms and phrases commonly associated with these practices before consulting @scogginsMeasuringTransparencySocial2024. -- Systemic Biases in AI and Big Data: Open science tools can be used to address biases in AI algorithms [@nororiAddressingBiasBig2021]. +To train machine learning models capable of classifying the papers, I will manually annotate a subset of papers. The sample size will be determined using weighted fitting of learning curves according to @figueroaPredictingSampleSize2012 which need an initial hand-coded sample size of 100-200. If the necessary sample size exceeds my time constraints, I will try to use clustering based text classification to extend the training sample [@zengCBCClusteringBased2003]. To ensure the representativeness of this subset, I will sample papers proportionally from different journals, publication years, and subfields within sociology and criminology. The stratified sampling approach will help mitigate biases and ensure that the training data reflects the diversity of the overall dataset. The sampled subset will serve as a "labeled" dataset for supervised learning. Different classification methods were considered but deemed as not suitable for the task as those were either found to be designed for document topic classification or too time intense for a master's thesis [e.g. @kimResearchPaperClassification2019; @sanguansatFeatureMatricizationDocument2012; @jandotInteractiveSemanticFeaturing2016]. Instead a two stage approach that has been used in other fields and highly specialized document classification tasks [@abdollahiOntologybasedTwoStageApproach2019]. using the manually labeled data, I will construct document-feature matrices (DFMs) based on the predefined keyword dictionaries in line with [@scogginsMeasuringTransparencySocial2024]. I will then train various machine learning models, including Naive Bayes, Logistic Regression, Support Vector Machines, and Gradient Boosted Trees. The performance of each model will be evaluated to identify the best-performing classifier for each category of open science practices. Once the optimal models are selected, I will use them to classify the entire dataset of papers. +The automated classification will enable me to categorize papers based on their adoption of open science practices. This classification will provide the foundation for subsequent analyses of temporal trends and other patterns within the data. Automating the classification process mitigates the inefficiency of manual data collection, allowing for the analysis of a significantly larger dataset than would otherwise be feasible. +## Analysis -### **Publication Bias**, selective reporting [@smaldinoOpenScienceModified2019; @fox142OpenScience2021]. -- **Problem:** Journals often favor publishing positive or statistically significant results, leaving negative or null findings unpublished. -- **How Open Science Helps:** Pre-registration of studies and publishing all research outcomes (e.g., via open access repositories) ensures that all results are accessible. Open science encourages the publication of all results, including negative or null findings, which helps reduce the bias towards publishing only positive results. By promoting transparency and the sharing of data and methodologies, open science reduces the tendency to selectively report only favorable outcomes - -### **Confirmation Bias** [@fox142OpenScience2021] -- **Problem:** -- **How Open Science Helps:** Open science practices, such as pre-registration of studies, help mitigate confirmation bias by specifying hypotheses and analysis plans before data collection - -### **Reproducibility Crisis** [@fox142OpenScience2021] -- **Problem:** Many scientific findings cannot be replicated due to opaque methodologies or unavailable data and code. -- **How Open Science Helps:** Sharing detailed methods, datasets, and analysis scripts in open repositories promotes reproducibility and verification. Open science addresses the reproducibility crisis by making data and methods openly available, allowing other researchers to verify and replicate findings. - -### **Algorithmic Bias** [@nororiAddressingBiasBig2021] -- **Problem:** -- **How Open Science Helps:** public data and training reports for ai enable - -### **Inefficiencies in Research Progress** -- **Problem:** Duplication of efforts and siloed research slow down scientific advancements. -- **How Open Science Helps:** Sharing negative results, datasets, and ongoing projects prevents duplication and accelerates innovation. - -### **Overemphasis on Novelty** -- **Problem:** The pressure to publish novel findings discourages replication studies or incremental advancements. -- **How Open Science Helps:** Encouraging and funding replication studies through open peer-review processes shifts focus towards reliable and cumulative science. - - - -### **Lack of Peer Review Transparency** -- **Problem:** Traditional peer review is often anonymous and lacks accountability, leading to potential biases or unfair evaluations. -- **How Open Science Helps:** Open peer review, where reviews and reviewer identities are accessible, ensures greater accountability and reduces bias. - -### **Authorship and Credit Bias** -- **Problem:** Early-career researchers, women, and underrepresented groups often face challenges in receiving credit for their contributions. -- **How Open Science Helps:** Transparent contributions using tools like the Contributor Roles Taxonomy (CRediT) ensure that all contributors are recognized for their specific roles. - -### **Conflicts of Interest** -- **Problem:** Undisclosed funding sources or affiliations may bias research findings. -- **How Open Science Helps:** Transparent declarations of conflicts of interest and funding sources reduce hidden biases. - -### **Limited Interdisciplinary Collaboration** -- **Problem:** Barriers to sharing research outputs restrict interdisciplinary collaboration, limiting innovation. -- **How Open Science Helps:** Open sharing of data, methods, and publications fosters cross-disciplinary integration and innovation. - - - -### **Data Access Inequality** -- **Problem:** Researchers in low-resource settings often lack access to expensive journals, datasets, or tools. -- **How Open Science Helps:** Open access publications and open data initiatives democratize access to research outputs, enabling equitable participation in science. - -### **Misuse of Metrics (e.g., Impact Factor, h-Index)** -- **Problem:** Reliance on quantitative metrics for evaluating research quality skews scientific priorities. -- **How Open Science Helps:** Encouraging diverse evaluation metrics (e.g., open data reuse, societal impact) ensures fair assessment of research contributions. - -### **Cherry-Picking and P-Hacking** -- **Problem:** Selective reporting or manipulating data to achieve statistical significance undermines the integrity of research. -- **How Open Science Helps:** Pre-registration of hypotheses and protocols discourages cherry-picking and promotes adherence to predefined analysis plans. - - - -### **Lack of Public Engagement** -- **Problem:** Complex scientific outputs are often inaccessible to the general public, leading to mistrust or misunderstanding of science. -- **How Open Science Helps:** Open access and lay summaries of research make science more inclusive and comprehensible to non-specialists. - -This commitment is rooted in the idea that scientific claims must be substantiated through consistent and reproducible evidence. Modern scientific inquiry, therefore, aligns with the notion that: - -> "Only by ... repetitions can we convince ourselves that we are not dealing with a mere isolated ‘coincidence’, but with events which, on account of their regularity and reproducibility, are in principle inter-subjectively testable." [@popperLogicScientificDiscovery2005, p. 23] - - +In the analysis phase of the research, an exploratory analysis will be conducted to explore temporal trends in the adoption of open science practices over the past decade. This involves comparing the adoption rates of practices such as pre-registration, open data, open materials, and open access across the disciplines of sociology and criminology, as well as among different journals. The goal is to identify any significant differences or similarities in how these practices have been embraced over time. This evaluation aims to uncover insights into the methodological rigor and transparency within the fields, providing a comprehensive understanding of the current landscape and potential areas for improvement in research practices. By building on the methods developed by @scogginsMeasuringTransparencySocial2024, I hope to generate insights that will inform future efforts to promote transparency and reproducibility in the social sciences. \newpage diff --git a/lit.bib b/lit.bib index c84fb14..b6a5004 100644 --- a/lit.bib +++ b/lit.bib @@ -1,3 +1,17 @@ +@inproceedings{abdollahiOntologybasedTwoStageApproach2019, + title = {An {{Ontology-based Two-Stage Approach}} to {{Medical Text Classification}} with {{Feature Selection}} by {{Particle Swarm Optimisation}}}, + booktitle = {2019 {{IEEE Congress}} on {{Evolutionary Computation}} ({{CEC}})}, + author = {Abdollahi, Mahdi and Gao, Xiaoying and Mei, Yi and Ghosh, Shameek and Li, Jinyan}, + year = {2019}, + month = jun, + pages = {119--126}, + doi = {10.1109/CEC.2019.8790259}, + urldate = {2024-12-16}, + abstract = {Document classification (DC) is the task of assigning pre-defined labels to unseen documents by utilizing a model trained on the available labeled documents. DC has attracted much attention in medical fields recently because many issues can be formulated as a classification problem. It can assist doctors in decision making and correct decisions can reduce the medical expenses. Medical documents have special attributes that distinguish them from other texts and make them difficult to analyze. For example, many acronyms and abbreviations, and short expressions make it more challenging to extract information. The classification accuracy of the current medical DC methods is not satisfactory. The goal of this work is to enhance the input feature sets of the DC method to improve the accuracy. To approach this goal, a novel two-stage approach is proposed. In the first stage, a domain-specific dictionary, namely the Unified Medical Language System (UMLS), is employed to extract the key features belonging to the most relevant concepts such as diseases or symptoms. In the second stage, PSO is applied to select more related features from the extracted features in the first stage. The performance of the proposed approach is evaluated on the 2010 Informatics for Integrating Biology and the Bedside (i2b2) data set which is a widely used medical text dataset. The experimental results show substantial improvement by the proposed method on the accuracy of classification.}, + keywords = {Conceptualization,Diseases,Feature extraction,Feature Selection,Medical Text Classification,Ontology,Particle swarm optimization,Particle Swarm Optimization,Task analysis,Text mining,Unified modeling language}, + file = {/home/michaelb/Zotero/storage/IG9J8G67/Abdollahi et al. - 2019 - An Ontology-based Two-Stage Approach to Medical Text Classification with Feature Selection by Partic.pdf;/home/michaelb/Zotero/storage/MLFVZT8V/8790259.html} +} + @article{abtRegisteredReportsJournal2021, title = {Registered {{Reports}} in the {{Journal}} of {{Sports Sciences}}}, author = {Abt, Grant and Boreham, Colin and Davison, Gareth and Jackson, Robin and Wallace, Eric and Williams, A Mark}, @@ -10,12 +24,29 @@ publisher = {Routledge}, issn = {0264-0414}, doi = {10.1080/02640414.2021.1950974}, - url = {https://doi.org/10.1080/02640414.2021.1950974}, urldate = {2024-11-06}, pmid = {34379576}, file = {/home/michaelb/Zotero/storage/RKLIRD6R/Abt et al. - 2021 - Registered Reports in the Journal of Sports Sciences.pdf} } +@article{akbaritabarGenderPatternsPublication2021, + title = {Gender {{Patterns}} of {{Publication}} in {{Top Sociological Journals}}}, + author = {Akbaritabar, Aliakbar and Squazzoni, Flaminio}, + year = {2021}, + month = may, + journal = {Science, Technology, \& Human Values}, + volume = {46}, + number = {3}, + pages = {555--576}, + publisher = {SAGE Publications Inc}, + issn = {0162-2439}, + doi = {10.1177/0162243920941588}, + urldate = {2024-12-15}, + abstract = {This article examines publication patterns over the last seventy years from the American Sociological Review and American Journal of Sociology, the two most prominent journals in sociology. We reconstructed the gender of all published authors and each author's academic pedigree. Results would suggest that these journals published disproportionally more articles by male authors and their coauthors. These gender inequalities persisted even when considering citations and after controlling for the influence of academic affiliation. It would seem that the potentially positive advantage of working in a prestigious, elite sociology department, in terms of better learning environment and reputational signal, for higher publication opportunities only significantly benefits male authors. While our findings do not mean that these journals have biased internal policies or implicit practices, this publication pattern needs to be considered especially regarding the possibility of their ``social closure'' and isomorphism.}, + langid = {english}, + file = {/home/michaelb/Zotero/storage/Z2P2N3KM/Akbaritabar and Squazzoni - 2021 - Gender Patterns of Publication in Top Sociological Journals.pdf} +} + @article{akkerPreregistrationSecondaryData2021, title = {Preregistration of Secondary Data Analysis: {{A}} Template and Tutorial}, shorttitle = {Preregistration of Secondary Data Analysis}, @@ -26,7 +57,6 @@ volume = {5}, issn = {2003-2714}, doi = {10.15626/MP.2020.2625}, - url = {https://open.lnu.se/index.php/metapsychology/article/view/2625}, urldate = {2024-11-06}, abstract = {Preregistration has been lauded as one of the solutions to the so-called `crisis of confidence' in the social sciences and has therefore gained popularity in recent years. However, the current guidelines for preregistration have been developed primarily for studies where new data will be collected. Yet, preregistering secondary data analyses--- where new analyses are proposed for existing data---is just as important, given that researchers' hypotheses and analyses may be biased by their prior knowledge of the data. The need for proper guidance in this area is especially desirable now that data is increasingly shared publicly. In this tutorial, we present a template specifically designed for the preregistration of secondary data analyses and provide comments and a worked example that may help with using the template effectively. Through this illustration, we show that completing such a template is feasible, helps limit researcher degrees of freedom, and may make researchers more deliberate in their data selection and analysis efforts.}, copyright = {Copyright (c) 2021 Olmo van den Akker, Sara Weston, Lorne Campbell, Bill Chopik, Rodica Damian, Pamela Davis-Kean, Andrew Hall, Jessica Kosie, Elliott Kruse, Jerome Olsen, Stuart Ritchie, KD Valentine, Anna van 't Veer, Marjan Bakker}, @@ -35,6 +65,171 @@ file = {/home/michaelb/Zotero/storage/YH9JQF8M/Akker et al. - 2021 - Preregistration of secondary data analysis A template and tutorial.pdf} } +@article{auspurgAusmassUndRisikofaktoren2014, + title = {{Ausma{\ss} und Risikofaktoren des Publication Bias in der deutschen Soziologie}}, + author = {Auspurg, Katrin and Hinz, Thomas and Schneck, Andreas}, + year = {2014}, + month = dec, + journal = {KZfSS K{\"o}lner Zeitschrift f{\"u}r Soziologie und Sozialpsychologie}, + volume = {66}, + number = {4}, + pages = {549--573}, + issn = {1861-891X}, + doi = {10.1007/s11577-014-0284-3}, + urldate = {2024-12-15}, + abstract = {Die statistische Signifikanz von Forschungsergebnissen wird oft f{\"a}lschlicherweise als ein Indikator f{\"u}r deren Relevanz und Aussagekraft gehalten. Signifikante Ergebnisse werden eher ver{\"o}ffentlicht, obwohl nicht-signifikante Ergebnisse gleicherma{\ss}en f{\"u}r den Erkenntnisfortschritt bedeutsam sind. Die Folgen sind eine {\"U}bersch{\"a}tzung von Effektst{\"a}rken und eine zu optimistische Beurteilung von Theorien. Im vorliegenden Beitrag wird dem Problem des Publication Bias (PB) in der deutschen Soziologie anhand von elf Jahrg{\"a}ngen der zwei wichtigsten deutschsprachigen Soziologie-Zeitschriften (K{\"o}lner Zeitschrift f{\"u}r Soziologie und Sozialpsychologie, Zeitschrift f{\"u}r Soziologie) mithilfe des Caliper-Tests nachgegangen. Lassen sich ebenso wie in US-amerikanischen Soziologie-Zeitschriften Hinweise auf einen PB finden, und wenn ja, unter welchen Bedingungen ist dieser besonders stark ausgepr{\"a}gt? Im Mittelpunkt der Ursachenanalyse stehen M{\"o}glichkeiten der Datenmanipulation sowie der sozialen Kontrolle durch Forschende. Im Ergebnis finden sich auch f{\"u}r die deutsche Soziologie Hinweise auf einen PB, wenngleich in schw{\"a}cherem Umfang als in US-amerikanischen Zeitschriften. Einfache Ma{\ss}nahmen wie Herausgebervorgaben, wonach Daten f{\"u}r Replikationen zur Verf{\"u}gung zu stellen sind, zeigen keine durchschlagende Wirkung. Es l{\"a}sst sich lediglich eine leichte Tendenz feststellen, dass komplexe Arbeiten mit mehreren parallel zu testenden Hypothesen das PB-Risiko abmildern.}, + langid = {ngerman}, + keywords = {Caliper test,Caliper-Test,Publication bias,Publication Bias,Rational-choice,Rational-Choice,Significance testing,Signifikanztest,Sociology of science,Wissenschaftssoziologie}, + file = {/home/michaelb/Zotero/storage/BZEYCCXC/Auspurg et al. - 2014 - Ausmaß und Risikofaktoren des Publication Bias in der deutschen Soziologie.pdf} +} + +@article{banksAnswers18Questions2019, + title = {Answers to 18 {{Questions About Open Science Practices}}}, + author = {Banks, George C. and Field, James G. and Oswald, Frederick L. and O'Boyle, Ernest H. and Landis, Ronald S. and Rupp, Deborah E. and Rogelberg, Steven G.}, + year = {2019}, + month = jun, + journal = {Journal of Business and Psychology}, + volume = {34}, + number = {3}, + pages = {257--270}, + issn = {1573-353X}, + doi = {10.1007/s10869-018-9547-8}, + urldate = {2024-12-16}, + abstract = {Open science refers to an array of practices that promote openness, integrity, and reproducibility in research; the merits of which are being vigorously debated and developed across academic journals, listservs, conference sessions, and professional associations. The current paper identifies and clarifies major issues related to the use of open science practices (e.g., data sharing, study pre-registration, open access journals). We begin with a useful general description of what open science in organizational research represents and adopt a question-and-answer format. Through this format, we then focus on the application of specific open science practices and explore future directions of open science. All of this builds up to a series of specific actionable recommendations provided in conclusion, to help individual researchers, reviewers, journal editors, and other stakeholders develop a more open research environment and culture.}, + langid = {english}, + keywords = {Open science,Philosophy of science,Questionable research practices,Research ethics}, + file = {/home/michaelb/Zotero/storage/C7RSDC77/Banks et al. - 2019 - Answers to 18 Questions About Open Science Practices.pdf} +} + +@inbook{berners-leeIsntItSemantic2011, + title = {Isn't It {{Semantic}}?}, + booktitle = {Leaders in {{Computing}}: {{Changing}} the Digital {{World}}}, + author = {{Berners-Lee}, Tim}, + year = {2011}, + month = sep, + series = {{{EBO Ser}}}, + publisher = {British Computer Society, The Turpin Distribution Services Limited [distributor]}, + address = {Swindon, Biggleswade}, + urldate = {2024-03-11}, + collaborator = {Knuth, Donald and Booch, Grady and Torvalds, Linus and Wozniak, Steve and Cerf, Vint and Sp{\"a}rck Jones, Karen and {Berners-Lee}, Tim and Wales, Jimmy and Shirley, Stephanie}, + isbn = {978-1-78017-099-2}, + langid = {english}, + annotation = {OCLC: 808089194} +} + +@article{breznauDoesSociologyNeed2021, + title = {Does {{Sociology Need Open Science}}?}, + author = {Breznau, Nate}, + year = {2021}, + month = mar, + journal = {Societies}, + volume = {11}, + number = {1}, + pages = {9}, + publisher = {Multidisciplinary Digital Publishing Institute}, + issn = {2075-4698}, + doi = {10.3390/soc11010009}, + urldate = {2024-12-15}, + abstract = {Reliability, transparency, and ethical crises pushed many social science disciplines toward dramatic changes, in particular psychology and more recently political science. This paper discusses why sociology should also change. It reviews sociology as a discipline through the lens of current practices, definitions of sociology, positions of sociological associations, and a brief consideration of the arguments of three highly influential yet epistemologically diverse sociologists: Weber, Merton, and Habermas. It is a general overview for students and sociologists to quickly familiarize themselves with the state of sociology or explore the idea of open science and its relevance to their discipline.}, + copyright = {http://creativecommons.org/licenses/by/3.0/}, + langid = {english}, + keywords = {crisis of science,Habermas,Merton,open science,p-hacking,publication bias,replication,research ethics,science community,sociology legitimation,transparency,Weber}, + file = {/home/michaelb/Zotero/storage/26AZJE4S/Breznau - 2021 - Does Sociology Need Open Science.pdf} +} + +@article{breznauObservingManyResearchers2022, + title = {Observing Many Researchers Using the Same Data and Hypothesis Reveals a Hidden Universe of Uncertainty}, + author = {Breznau, Nate and Rinke, Eike Mark and Wuttke, Alexander and Nguyen, Hung H. V. and Adem, Muna and Adriaans, Jule and {Alvarez-Benjumea}, Amalia and Andersen, Henrik K. and Auer, Daniel and Azevedo, Flavio and Bahnsen, Oke and Balzer, Dave and Bauer, Gerrit and Bauer, Paul C. and Baumann, Markus and Baute, Sharon and Benoit, Verena and Bernauer, Julian and Berning, Carl and Berthold, Anna and Bethke, Felix S. and Biegert, Thomas and Blinzler, Katharina and Blumenberg, Johannes N. and Bobzien, Licia and Bohman, Andrea and Bol, Thijs and Bostic, Amie and Brzozowska, Zuzanna and Burgdorf, Katharina and Burger, Kaspar and Busch, Kathrin B. and {Carlos-Castillo}, Juan and Chan, Nathan and Christmann, Pablo and Connelly, Roxanne and Czymara, Christian S. and Damian, Elena and Ecker, Alejandro and Edelmann, Achim and Eger, Maureen A. and Ellerbrock, Simon and Forke, Anna and Forster, Andrea and Gaasendam, Chris and Gavras, Konstantin and Gayle, Vernon and Gessler, Theresa and Gnambs, Timo and Godefroidt, Am{\'e}lie and Gr{\"o}mping, Max and Gro{\ss}, Martin and Gruber, Stefan and Gummer, Tobias and Hadjar, Andreas and Heisig, Jan Paul and Hellmeier, Sebastian and Heyne, Stefanie and Hirsch, Magdalena and Hjerm, Mikael and Hochman, Oshrat and H{\"o}vermann, Andreas and Hunger, Sophia and Hunkler, Christian and Huth, Nora and Ign{\'a}cz, Zs{\'o}fia S. and Jacobs, Laura and Jacobsen, Jannes and Jaeger, Bastian and Jungkunz, Sebastian and Jungmann, Nils and Kauff, Mathias and Kleinert, Manuel and Klinger, Julia and Kolb, Jan-Philipp and Ko{\l}czy{\'n}ska, Marta and Kuk, John and Kuni{\ss}en, Katharina and Kurti Sinatra, Dafina and Langenkamp, Alexander and Lersch, Philipp M. and L{\"o}bel, Lea-Maria and Lutscher, Philipp and Mader, Matthias and Madia, Joan E. and Malancu, Natalia and Maldonado, Luis and Marahrens, Helge and Martin, Nicole and Martinez, Paul and Mayerl, Jochen and Mayorga, Oscar J. and McManus, Patricia and McWagner, Kyle and Meeusen, Cecil and Meierrieks, Daniel and Mellon, Jonathan and Merhout, Friedolin and Merk, Samuel and Meyer, Daniel and Micheli, Leticia and Mijs, Jonathan and Moya, Crist{\'o}bal and Neunhoeffer, Marcel and N{\"u}st, Daniel and Nyg{\aa}rd, Olav and Ochsenfeld, Fabian and Otte, Gunnar and Pechenkina, Anna O. and Prosser, Christopher and Raes, Louis and Ralston, Kevin and Ramos, Miguel R. and Roets, Arne and Rogers, Jonathan and Ropers, Guido and Samuel, Robin and Sand, Gregor and Schachter, Ariela and Schaeffer, Merlin and Schieferdecker, David and Schlueter, Elmar and Schmidt, Regine and Schmidt, Katja M. and {Schmidt-Catran}, Alexander and Schmiedeberg, Claudia and Schneider, J{\"u}rgen and Schoonvelde, Martijn and {Schulte-Cloos}, Julia and Schumann, Sandy and Schunck, Reinhard and Schupp, J{\"u}rgen and Seuring, Julian and Silber, Henning and Sleegers, Willem and Sonntag, Nico and Staudt, Alexander and Steiber, Nadia and Steiner, Nils and Sternberg, Sebastian and Stiers, Dieter and Stojmenovska, Dragana and Storz, Nora and Striessnig, Erich and Stroppe, Anne-Kathrin and Teltemann, Janna and Tibajev, Andrey and Tung, Brian and Vagni, Giacomo and Van Assche, Jasper and {van der Linden}, Meta and {van der Noll}, Jolanda and Van Hootegem, Arno and Vogtenhuber, Stefan and Voicu, Bogdan and Wagemans, Fieke and Wehl, Nadja and Werner, Hannah and Wiernik, Brenton M. and Winter, Fabian and Wolf, Christof and Yamada, Yuki and Zhang, Nan and Ziller, Conrad and Zins, Stefan and {\.Z}{\'o}{\l}tak, Tomasz}, + year = {2022}, + month = nov, + journal = {Proceedings of the National Academy of Sciences}, + volume = {119}, + number = {44}, + pages = {e2203150119}, + publisher = {Proceedings of the National Academy of Sciences}, + doi = {10.1073/pnas.2203150119}, + urldate = {2024-12-15}, + abstract = {This study explores how researchers' analytical choices affect the reliability of scientific findings. Most discussions of reliability problems in science focus on systematic biases. We broaden the lens to emphasize the idiosyncrasy of conscious and unconscious decisions that researchers make during data analysis. We coordinated 161 researchers in 73 research teams and observed their research decisions as they used the same data to independently test the same prominent social science hypothesis: that greater immigration reduces support for social policies among the public. In this typical case of social science research, research teams reported both widely diverging numerical findings and substantive conclusions despite identical start conditions. Researchers' expertise, prior beliefs, and expectations barely predict the wide variation in research outcomes. More than 95\% of the total variance in numerical results remains unexplained even after qualitative coding of all identifiable decisions in each team's workflow. This reveals a universe of uncertainty that remains hidden when considering a single study in isolation. The idiosyncratic nature of how researchers' results and conclusions varied is a previously underappreciated explanation for why many scientific hypotheses remain contested. These results call for greater epistemic humility and clarity in reporting scientific findings.}, + file = {/home/michaelb/Zotero/storage/5WU4WFFE/Breznau et al. - 2022 - Observing many researchers using the same data and hypothesis reveals a hidden universe of uncertain.pdf} +} + +@misc{britannicaLinusTorvalds2023, + title = {Linus {{Torvalds}}}, + author = {Britannica, The Editors of Encyclopaedia}, + year = {2023}, + month = dec, + journal = {Encyclopedia Britannica}, + urldate = {2024-03-11}, + abstract = {Linus Torvalds, Finnish computer scientist who was the principal force behind the development of the Linux operating system. In 1991 he made the Linux software available for free downloading, and he released the source code, which meant that anyone could modify Linux to suit their own purposes.}, + howpublished = {https://www.britannica.com/biography/Linus-Torvalds}, + langid = {english}, + file = {/home/michaelb/Zotero/storage/LPJ8RFCL/Linus-Torvalds.html} +} + +@misc{britannicaLinux2024, + title = {Linux}, + author = {Britannica, The Editors of Encyclopaedia}, + year = {2024}, + month = mar, + journal = {Encyclopedia Britannica}, + urldate = {2024-03-11}, + abstract = {Linux, computer operating system created in the early 1990s by Finnish software engineer Linus Torvalds and the Free Software Foundation. Because it is open-source, and thus modifiable for different uses, Linux is popular for systems as diverse as cellular telephones and supercomputers.}, + howpublished = {https://www.britannica.com/technology/Linux}, + langid = {english}, + file = {/home/michaelb/Zotero/storage/NKKAMTZ5/Linux.html} +} + +@incollection{brownAttributedEinstein2019, + title = {Attributed to {{Einstein}}}, + booktitle = {The {{Ultimate Quotable Einstein}}}, + author = {Brown, Rita Mae}, + editor = {Calaprice, Alice}, + year = {2019}, + month = dec, + pages = {471--486}, + publisher = {Princeton University Press}, + doi = {10.1515/9780691207292-025}, + urldate = {2024-12-11}, + isbn = {978-0-691-20729-2}, + langid = {english}, + file = {/home/michaelb/Zotero/storage/FYEPELPQ/Einstein - 2019 - Attributed to Einstein.pdf} +} + +@misc{cernBirthWebCERN, + title = {The Birth of the {{Web}} {\textbar} {{CERN}}}, + author = {{CERN}}, + journal = {CERN - The birth of the Web}, + urldate = {2024-03-11}, + howpublished = {https://home.cern/science/computing/birth-web}, + file = {/home/michaelb/Zotero/storage/T7US7PIX/birth-web.html} +} + +@misc{cernCerninfochTimBernersLees, + title = {Cern.Info.Ch - {{Tim Berners-Lee}}'s Proposal}, + author = {{CERN}}, + urldate = {2024-03-11}, + howpublished = {https://info.cern.ch/Proposal.html}, + file = {/home/michaelb/Zotero/storage/ULVHPAES/Proposal.html} +} + +@article{chinQuestionableResearchPractices2023, + title = {Questionable {{Research Practices}} and {{Open Science}} in {{Quantitative Criminology}}}, + author = {Chin, Jason M. and Pickett, Justin T. and Vazire, Simine and Holcombe, Alex O.}, + year = {2023}, + month = mar, + journal = {Journal of Quantitative Criminology}, + volume = {39}, + number = {1}, + pages = {21--51}, + issn = {1573-7799}, + doi = {10.1007/s10940-021-09525-6}, + urldate = {2024-11-06}, + abstract = {Questionable research practices (QRPs) lead to incorrect research results and contribute to irreproducibility in science. Researchers and institutions have proposed open science practices (OSPs) to improve the detectability of QRPs and the credibility of science. We examine the prevalence of QRPs and OSPs in criminology, and researchers' opinions of those practices.}, + langid = {english}, + keywords = {Meta-research,Open science,Questionable research practices,Reproducibility}, + file = {/home/michaelb/Zotero/storage/N9HQXU5H/Chin et al. - 2023 - Questionable Research Practices and Open Science in Quantitative Criminology.pdf} +} + @article{chinQuestionableResearchPractices2023a, title = {Questionable {{Research Practices}} and {{Open Science}} in {{Quantitative Criminology}}}, author = {Chin, Jason M. and Pickett, Justin T. and Vazire, Simine and Holcombe, Alex O.}, @@ -46,7 +241,6 @@ pages = {21--51}, issn = {1573-7799}, doi = {10.1007/s10940-021-09525-6}, - url = {https://doi.org/10.1007/s10940-021-09525-6}, urldate = {2024-11-06}, abstract = {Questionable research practices (QRPs) lead to incorrect research results and contribute to irreproducibility in science. Researchers and institutions have proposed open science practices (OSPs) to improve the detectability of QRPs and the credibility of science. We examine the prevalence of QRPs and OSPs in criminology, and researchers' opinions of those practices.}, langid = {english}, @@ -66,13 +260,113 @@ pages = {211037}, publisher = {Royal Society}, doi = {10.1098/rsos.211037}, - url = {https://royalsocietypublishing.org/doi/10.1098/rsos.211037}, urldate = {2024-11-06}, abstract = {Preregistration is a method to increase research transparency by documenting research decisions on a public, third-party repository prior to any influence by data. It is becoming increasingly popular in all subfields of psychology and beyond. Adherence to the preregistration plan may not always be feasible and even is not necessarily desirable, but without disclosure of deviations, readers who do not carefully consult the preregistration plan might get the incorrect impression that the study was exactly conducted and reported as planned. In this paper, we have investigated adherence and disclosure of deviations for all articles published with the Preregistered badge in Psychological Science between February 2015 and November 2017 and shared our findings with the corresponding authors for feedback. Two out of 27 preregistered studies contained no deviations from the preregistration plan. In one study, all deviations were disclosed. Nine studies disclosed none of the deviations. We mainly observed (un)disclosed deviations from the plan regarding the reported sample size, exclusion criteria and statistical analysis. This closer look at preregistrations of the first generation reveals possible hurdles for reporting preregistered studies and provides input for future reporting guidelines. We discuss the results and possible explanations, and provide recommendations for preregistered research.}, keywords = {open science,preregistration,psychological science,researcher degrees of freedom,transparency}, file = {/home/michaelb/Zotero/storage/V555Q9F6/Claesen et al. - 2021 - Comparing dream to reality an assessment of adherence of the first generation of preregistered stud.pdf} } +@misc{cnnCNNcomReclusiveLinux, + title = {{{CNN}}.Com - {{Reclusive Linux}} Founder Opens up - {{May}} 18, 2006}, + author = {{CNN}}, + urldate = {2024-03-11}, + howpublished = {https://edition.cnn.com/2006/BUSINESS/05/18/global.office.linustorvalds/} +} + +@misc{dennisTimBernersLee2023, + title = {Tim {{Berners-Lee}}}, + author = {Dennis, Michael Aaron}, + year = {2023}, + month = dec, + journal = {Encyclopedia Britannica}, + urldate = {2024-03-11}, + abstract = {Tim Berners-Lee, British computer scientist, generally credited as the inventor of the World Wide Web. In 2004 he was knighted by Queen Elizabeth II and received the Millennium Technology Prize from the Finnish Technology Award Foundation. In 2007 he was awarded the Draper Prize by the National Academy of Engineering.}, + howpublished = {https://www.britannica.com/biography/Tim-Berners-Lee}, + langid = {english}, + file = {/home/michaelb/Zotero/storage/EACBMKS2/Tim-Berners-Lee.html} +} + +@article{dickelDigitaleInklusionZur2015, + title = {{Digitale Inklusion: Zur sozialen {\"O}ffnung des Wissenschaftssystems / Digital Inclusion: The Social Implications of Open Science}}, + shorttitle = {{Digitale Inklusion}}, + author = {Dickel, Sascha and Franzen, Martina}, + year = {2015}, + month = oct, + journal = {Zeitschrift f{\"u}r Soziologie}, + volume = {44}, + number = {5}, + pages = {330--347}, + publisher = {De Gruyter Oldenbourg}, + issn = {2366-0325}, + doi = {10.1515/zfsoz-2015-0503}, + urldate = {2024-12-15}, + abstract = {From the perspective of systems theory, science is a prototype of a self-referential functional system that maintains social distance to the public. In functionally differentiated societies, science maintains a strict regime of inclusion, which is closely tied to the professional role of the scientist as someone who produces and acquires knowledge. We suggest that the digital revolution is generating novel modes of inclusion. These take the form of functionalized subroles in which the professional role of the scientist is disassembled. By proposing a socio-theoretically informed characterization of these new modes of inclusion we aim to meet two different goals: The first is to overcome the theoretical conservatism of differentiation theory, in which diagnoses of the social openness of science are solely interpreted as semantic surface phenomena. The second is to achieve analytical distance to a societal discourse that describes these new modes of inclusion as examples of a successful democratization of science.}, + copyright = {De Gruyter expressly reserves the right to use all content for commercial text and data mining within the meaning of Section 44b of the German Copyright Act.}, + langid = {ngerman}, + keywords = {Inclusion: Citizen Science,Societal Differentiation,Sociology of Science,Web 2.0}, + file = {/home/michaelb/Zotero/storage/UXJ5I59H/Dickel and Franzen - 2015 - Digitale Inklusion Zur sozialen Öffnung des Wissenschaftssystems Digital Inclusion The Social Im.pdf} +} + +@article{dienlinAgendaOpenScience2021, + title = {An {{Agenda}} for {{Open Science}} in {{Communication}}}, + author = {Dienlin, Tobias and Johannes, Niklas and Bowman, Nicholas David and Masur, Philipp K and Engesser, Sven and K{\"u}mpel, Anna Sophie and Lukito, Josephine and Bier, Lindsey M and Zhang, Renwen and Johnson, Benjamin K and Huskey, Richard and Schneider, Frank M and Breuer, Johannes and Parry, Douglas A and Vermeulen, Ivar and Fisher, Jacob T and Banks, Jaime and Weber, Ren{\'e} and Ellis, David A and Smits, Tim and Ivory, James D and Trepte, Sabine and McEwan, Bree and Rinke, Eike Mark and Neubaum, German and Winter, Stephan and Carpenter, Christopher J and Kr{\"a}mer, Nicole and Utz, Sonja and Unkel, Julian and Wang, Xiaohui and Davidson, Brittany I and Kim, Nuri and Won, Andrea Stevenson and Domahidi, Emese and Lewis, Neil A and {de Vreese}, Claes}, + year = {2021}, + month = feb, + journal = {Journal of Communication}, + volume = {71}, + number = {1}, + pages = {1--26}, + issn = {0021-9916}, + doi = {10.1093/joc/jqz052}, + urldate = {2024-12-16}, + abstract = {In the last 10 years, many canonical findings in the social sciences appear unreliable. This so-called ``replication crisis'' has spurred calls for open science practices, which aim to increase the reproducibility, replicability, and generalizability of findings. Communication research is subject to many of the same challenges that have caused low replicability in other fields. As a result, we propose an agenda for adopting open science practices in Communication, which includes the following seven suggestions: (1) publish materials, data, and code; (2) preregister studies and submit registered reports; (3) conduct replications; (4) collaborate; (5) foster open science skills; (6) implement Transparency and Openness Promotion Guidelines; and (7) incentivize open science practices. Although in our agenda we focus mostly on quantitative research, we also reflect on open science practices relevant to qualitative research. We conclude by discussing potential objections and concerns associated with open science practices.}, + file = {/home/michaelb/Zotero/storage/GH7PZSVG/Dienlin et al. - 2021 - An Agenda for Open Science in Communication.pdf;/home/michaelb/Zotero/storage/FUUT9S83/5803422.html} +} + +@article{dollBayesianModelSelection2019, + title = {Bayesian {{Model Selection}} in {{Fisheries Management}} and {{Ecology}}}, + author = {Doll, Jason C. and Jacquemin, Stephen J.}, + year = {2019}, + month = sep, + journal = {Journal of Fish and Wildlife Management}, + volume = {10}, + number = {2}, + pages = {691--707}, + issn = {1944-687X}, + doi = {10.3996/042019-JFWM-024}, + urldate = {2024-12-13}, + abstract = {Researchers often test ecological hypotheses relating to a myriad of questions ranging from assemblage structure, population dynamics, demography, abundance, growth rate, and more using mathematical models that explain trends in data. To aid in the evaluation process when faced with competing hypotheses, we employ statistical methods to evaluate the validity of these multiple hypotheses with the goal of deriving the most robust conclusions possible. In fisheries management and ecology, frequentist methodologies have largely dominated this approach. However, in recent years, researchers have increasingly used Bayesian inference methods to estimate model parameters. Our aim with this perspective is to provide the practicing fisheries ecologist with an accessible introduction to Bayesian model selection. Here we discuss Bayesian inference methods for model selection in the context of fisheries management and ecology with empirical examples to guide researchers in the use of these methods. In this perspective we discuss three methods for selecting among competing models. For comparing two models we discuss Bayes factor and for more complex models we discuss Watanabe--Akaike information criterion and leave-one-out cross-validation. We also describe what kinds of information to report when conducting Bayesian inference. We conclude this review with a discussion of final thoughts about these model selection techniques.}, + file = {/home/michaelb/Zotero/storage/Y57Q6CN3/Doll and Jacquemin - 2019 - Bayesian Model Selection in Fisheries Management and Ecology.pdf} +} + +@misc{dunleavyUseMisuseClassical2021, + title = {The {{Use}} and {{Misuse}} of {{Classical Statistics}}: {{A Primer}} for {{Social Workers}}}, + author = {Dunleavy, Daniel J. and Lacasse, Jeffrey R.}, + year = {2021}, + urldate = {2024-12-13}, + howpublished = {https://journals.sagepub.com/doi/10.1177/10497315211008247}, + file = {/home/michaelb/Zotero/storage/NI8KLN2F/10497315211008247.html} +} + +@article{eisendInternetNewMedium2002, + title = {The {{Internet}} as a New Medium for the Sciences? {{The}} Effects of {{Internet}} Use on Traditional Scientific Communication Media among Social Scientists in {{Germany}}}, + shorttitle = {The {{Internet}} as a New Medium for the Sciences?}, + author = {Eisend, Martin}, + year = {2002}, + month = jan, + journal = {Online Information Review}, + volume = {26}, + number = {5}, + pages = {307--317}, + publisher = {MCB UP Ltd}, + issn = {1468-4527}, + doi = {10.1108/14684520210447877}, + urldate = {2024-12-13}, + abstract = {Scientific communication takes place within two main fields: research and publication. Whereas twentieth century audio-visual media did not become established in the scientific communication system, the Internet, with its variety of communication options, is able to enter both fields of communication and has even revolutionised this communication system to some extent. The investigation of this relationship is based on data from a study of social scientists taken in Berlin in autumn 1999. The Internet substitutes written communication media and complements forms of spoken communication in the field of research. It also complements traditional publisher-oriented forms of publication and is even a substitute for works that have previously avoided publication. Therefore, the Internet should not be regarded as a new alternative to traditional and institutionalised structures of communication of scientific publications, as it has already become institutionalised in the field of research as a medium of interpersonal communication.}, + keywords = {Communications,Electronic publishing,Internet,Publishing}, + file = {/home/michaelb/Zotero/storage/BQ76I2XY/Eisend - 2002 - The Internet as a new medium for the sciences The effects of Internet use on traditional scientific.pdf} +} + @article{evansImprovingEvidencebasedPractice2023, title = {Improving Evidence-Based Practice through Preregistration of Applied Research: {{Barriers}} and Recommendations}, shorttitle = {Improving Evidence-Based Practice through Preregistration of Applied Research}, @@ -86,7 +380,6 @@ publisher = {Taylor \& Francis}, issn = {0898-9621}, doi = {10.1080/08989621.2021.1969233}, - url = {https://doi.org/10.1080/08989621.2021.1969233}, urldate = {2024-11-06}, abstract = {Preregistration is the practice of publicly publishing plans on central components of the research process before access to, or collection, of data. Within the context of the replication crisis, open science practices like preregistration have been pivotal in facilitating greater transparency in research. However, such practices have been applied nearly exclusively to basic academic research, with rare consideration of the relevance to applied and consultancy-based research. This is particularly problematic as such research is typically reported with very low levels of transparency and accountability despite being disseminated as influential gray literature to inform practice. Evidence-based practice is best served by an appreciation of multiple sources of quality evidence, thus the current review considers the potential of preregistration to improve both the accessibility and credibility of applied research toward more rigorous evidence-based practice. The current three-part review outlines, first, the opportunities of preregistration for applied research, and second, three barriers -- practical challenges, stakeholder roles, and the suitability of preregistration. Last, this review makes four recommendations to overcome these barriers and maximize the opportunities of preregistration for academics, industry, and the structures they are held within -- changes to preregistration templates, new types of templates, education and training, and recognition and structural changes.}, pmid = {34396837}, @@ -94,6 +387,119 @@ file = {/home/michaelb/Zotero/storage/CYN3BKSJ/Evans et al. - 2023 - Improving evidence-based practice through preregistration of applied research Barriers and recommen.pdf} } +@misc{FalsePositivePsychologyUndisclosed, + title = {False-{{Positive Psychology}}: {{Undisclosed Flexibility}} in {{Data Collection}} and {{Analysis Allows Presenting Anything}} as {{Significant}} - {{Joseph P}}. {{Simmons}}, {{Leif D}}. {{Nelson}}, {{Uri Simonsohn}}, 2011}, + urldate = {2024-12-15}, + howpublished = {https://journals.sagepub.com/doi/10.1177/0956797611417632}, + file = {/home/michaelb/Zotero/storage/JQDNBLUB/0956797611417632.html} +} + +@article{fanelliNegativeResultsAre2012, + title = {Negative Results Are Disappearing from Most Disciplines and Countries}, + author = {Fanelli, Daniele}, + year = {2012}, + month = mar, + journal = {Scientometrics}, + volume = {90}, + number = {3}, + pages = {891--904}, + issn = {1588-2861}, + doi = {10.1007/s11192-011-0494-7}, + urldate = {2024-12-15}, + abstract = {Concerns that the growing competition for funding and citations might distort science are frequently discussed, but have not been verified directly. Of the hypothesized problems, perhaps the most worrying is a worsening of positive-outcome bias. A system that disfavours negative results not only distorts the scientific literature directly, but might also discourage high-risk projects and pressure scientists to fabricate and falsify their data. This study analysed over 4,600 papers published in all disciplines between 1990 and 2007, measuring the frequency of papers that, having declared to have ``tested'' a hypothesis, reported a positive support for it. The overall frequency of positive supports has grown by over 22\% between 1990 and 2007, with significant differences between disciplines and countries. The increase was stronger in the social and some biomedical disciplines. The United States had published, over the years, significantly fewer positive results than Asian countries (and particularly Japan) but more than European countries (and in particular the United Kingdom). Methodological artefacts cannot explain away these patterns, which support the hypotheses that research is becoming less pioneering and/or that the objectivity with which results are produced and published is decreasing.}, + langid = {english}, + keywords = {Bias,Competition,Misconduct,Publication,Publish or perish,Research evaluation}, + file = {/home/michaelb/Zotero/storage/LLSK77JK/Fanelli - 2012 - Negative results are disappearing from most disciplines and countries.pdf} +} + +@article{fergusonSurveyOpenScience2023, + title = {Survey of Open Science Practices and Attitudes in the Social Sciences}, + author = {Ferguson, Joel and Littman, Rebecca and Christensen, Garret and Paluck, Elizabeth Levy and Swanson, Nicholas and Wang, Zenan and Miguel, Edward and Birke, David and Pezzuto, John-Henry}, + year = {2023}, + month = sep, + journal = {Nature Communications}, + volume = {14}, + number = {1}, + pages = {5401}, + publisher = {Nature Publishing Group}, + issn = {2041-1723}, + doi = {10.1038/s41467-023-41111-1}, + urldate = {2024-12-15}, + abstract = {Open science practices such as posting data or code and pre-registering analyses are increasingly prescribed and debated in the applied sciences, but the actual popularity and lifetime usage of these practices remain unknown. This study provides an assessment of attitudes toward, use of, and perceived norms regarding open science practices from a sample of authors published in top-10 (most-cited) journals and PhD students in top-20 ranked North American departments from four major social science disciplines: economics, political science, psychology, and sociology. We observe largely favorable private attitudes toward widespread lifetime usage (meaning that a researcher has used a particular practice at least once) of open science practices. As of 2020, nearly 90\% of scholars had ever used at least one such practice. Support for posting data or code online is higher (88\% overall support and nearly at the ceiling in some fields) than support for pre-registration (58\% overall). With respect to norms, there is evidence that the scholars in our sample appear to underestimate the use of open science practices in their field. We also document that the reported lifetime prevalence of open science practices increased from 49\% in 2010 to 87\% a decade later.}, + copyright = {2023 The Author(s)}, + langid = {english}, + keywords = {Economics,Human behaviour,Interdisciplinary studies,Psychology,Sociology}, + file = {/home/michaelb/Zotero/storage/NYJJF8KD/Ferguson et al. - 2023 - Survey of open science practices and attitudes in the social sciences.pdf} +} + +@article{figueroaPredictingSampleSize2012, + title = {Predicting Sample Size Required for Classification Performance}, + author = {Figueroa, Rosa L. and {Zeng-Treitler}, Qing and Kandula, Sasikiran and Ngo, Long H.}, + year = {2012}, + month = dec, + journal = {BMC Medical Informatics and Decision Making}, + volume = {12}, + number = {1}, + pages = {1--10}, + publisher = {BioMed Central}, + issn = {1472-6947}, + doi = {10.1186/1472-6947-12-8}, + urldate = {2024-12-16}, + abstract = {Supervised learning methods need annotated data in order to generate efficient models. Annotated data, however, is a relatively scarce resource and can be expensive to obtain. For both passive and active learning methods, there is a need to estimate the size of the annotated sample required to reach a performance target. We designed and implemented a method that fits an inverse power law model to points of a given learning curve created using a small annotated training set. Fitting is carried out using nonlinear weighted least squares optimization. The fitted model is then used to predict the classifier's performance and confidence interval for larger sample sizes. For evaluation, the nonlinear weighted curve fitting method was applied to a set of learning curves generated using clinical text and waveform classification tasks with active and passive sampling methods, and predictions were validated using standard goodness of fit measures. As control we used an un-weighted fitting method. A total of 568 models were fitted and the model predictions were compared with the observed performances. Depending on the data set and sampling method, it took between 80 to 560 annotated samples to achieve mean average and root mean squared error below 0.01. Results also show that our weighted fitting method outperformed the baseline un-weighted method (p {$<$} 0.05). This paper describes a simple and effective sample size prediction algorithm that conducts weighted fitting of learning curves. The algorithm outperformed an un-weighted algorithm described in previous literature. It can help researchers determine annotation sample size for supervised machine learning.}, + copyright = {2012 Figueroa et al; licensee BioMed Central Ltd.}, + langid = {english}, + file = {/home/michaelb/Zotero/storage/P9KWW7SU/Figueroa et al. - 2012 - Predicting sample size required for classification performance.pdf} +} + +@article{finkReplicationCodeAvailability, + title = {Replication Code Availability over Time and across Fields: {{Evidence}} from the {{German Socio-Economic Panel}}}, + shorttitle = {Replication Code Availability over Time and across Fields}, + author = {Fink, Lukas and Marcus, Jan}, + journal = {Economic Inquiry}, + volume = {n/a}, + number = {n/a}, + issn = {1465-7295}, + doi = {10.1111/ecin.13267}, + urldate = {2024-12-15}, + abstract = {Providing replication code is an inexpensive way to facilitate reproducibility. However, little is known about the extent of replication code provision. Therefore, we examine the availability of replication code for over 2500 peer-reviewed articles based on the German Socio-Economic Panel (SOEP), one of the most widely used datasets in economics and other social sciences. We find that only 6\% of SOEP-based studies have code available, but that this proportion has increased sharply over time. We provide evidence that the increase in code provision is driven by technological advances, individual researcher initiatives, and journal policies.}, + copyright = {{\copyright} 2024 The Author(s). Economic Inquiry published by Wiley Periodicals LLC on behalf of Western Economic Association International.}, + langid = {english}, + keywords = {code availability,journal policies,replication code,reproducibility,SOEP}, + file = {/home/michaelb/Zotero/storage/W33WQ3CA/Fink and Marcus - Replication code availability over time and across fields Evidence from the German Socio-Economic P.pdf;/home/michaelb/Zotero/storage/7ENSBIJZ/ecin.html} +} + +@article{fox142OpenScience2021, + title = {142 {{Open Science}}: {{Improving Access}} and {{Reducing Bias}} in {{Science}}}, + shorttitle = {142 {{Open Science}}}, + author = {Fox, Nick}, + year = {2021}, + month = nov, + journal = {Journal of Animal Science}, + volume = {99}, + number = {Supplement\_3}, + pages = {75--76}, + issn = {1525-3163}, + doi = {10.1093/jas/skab235.136}, + urldate = {2024-12-13}, + abstract = {The promise of science lies in the discovery of basic knowledge, new treatments for disease and possible solutions to the world's problems. Fulfilling this promise requires confidence that the findings of published science are valid---that they represent an unbiased conclusion based on available data. In recent years, however, a ``reproducibility crisis'' has emerged indicating that published findings across research fields may be less credible than they seem, perhaps due to hidden biases in the research process. This talk will provide an overview of the key challenges that reduce the credibility and reproducibility of research and will discuss how open science practices address these challenges. Current practice is sustained by a dysfunctional incentive structure that prioritizes publication over accuracy. Changing the research culture to prioritize ``getting it right'' over ``getting it published'' requires nudges to the incentive landscape, while still fueling the engine of innovation and discovery that drives science into new domains.}, + file = {/home/michaelb/Zotero/storage/L5KP3EZB/Fox - 2021 - 142 Open Science Improving Access and Reducing Bias in Science.pdf;/home/michaelb/Zotero/storage/73RP5SGM/6383842.html} +} + +@book{francoHandbuchKarlPopper2019, + title = {{Handbuch Karl Popper}}, + editor = {Franco, Giuseppe}, + year = {2019}, + publisher = {Springer Fachmedien}, + address = {Wiesbaden}, + doi = {10.1007/978-3-658-16239-9}, + urldate = {2024-12-11}, + copyright = {http://www.springer.com/tdm}, + isbn = {978-3-658-16238-2 978-3-658-16239-9}, + langid = {ngerman}, + keywords = {Falsifikation,Kritischer Rationalismus,Popper Karl,Positivismusstreit,Wissenschaftstheorie}, + file = {/home/michaelb/Zotero/storage/XGF9DSKG/Franco - 2019 - Handbuch Karl Popper.pdf} +} + @article{francoPublicationBiasSocial2014, title = {Publication Bias in the Social Sciences: {{Unlocking}} the File Drawer}, shorttitle = {Publication Bias in the Social Sciences}, @@ -106,12 +512,27 @@ pages = {1502--1505}, publisher = {American Association for the Advancement of Science}, doi = {10.1126/science.1255484}, - url = {https://www.science.org/doi/10.1126/science.1255484}, urldate = {2024-11-06}, abstract = {We studied publication bias in the social sciences by analyzing a known population of conducted studies---221 in total---in which there is a full accounting of what is published and unpublished. We leveraged Time-sharing Experiments in the Social Sciences (TESS), a National Science Foundation--sponsored program in which researchers propose survey-based experiments to be run on representative samples of American adults. Because TESS proposals undergo rigorous peer review, the studies in the sample all exceed a substantial quality threshold. Strong results are 40 percentage points more likely to be published than are null results and 60 percentage points more likely to be written up. We provide direct evidence of publication bias and identify the stage of research production at which publication bias occurs: Authors do not write up and submit null findings.}, file = {/home/michaelb/Zotero/storage/3INXI5Z4/Franco et al. - 2014 - Publication bias in the social sciences Unlocking the file drawer.pdf} } +@article{freeseAdvancesTransparencyReproducibility2022, + title = {Advances in Transparency and Reproducibility in the Social Sciences}, + author = {Freese, Jeremy and Rauf, Tamkinat and Voelkel, Jan Gerrit}, + year = {2022}, + month = sep, + journal = {Social Science Research}, + volume = {107}, + pages = {102770}, + issn = {0049-089X}, + doi = {10.1016/j.ssresearch.2022.102770}, + urldate = {2024-12-15}, + abstract = {Worries about a ``credibility crisis'' besieging science have ignited interest in research transparency and reproducibility as ways of restoring trust in published research. For quantitative social science, advances in transparency and reproducibility can be seen as a set of developments whose trajectory predates the recent alarm. We discuss several of these developments, including preregistration, data-sharing, formal infrastructure in the form of resources and policies, open access to research, and specificity regarding research contributions. We also discuss the spillovers of this predominantly quantitative effort towards transparency for qualitative research. We conclude by emphasizing the importance of mutual accountability for effective science, the essential role of openness for this accountability, and the importance of scholarly inclusiveness in figuring out the best ways for openness to be accomplished in practice.}, + keywords = {Open science,Reproducibility,Transparency}, + file = {/home/michaelb/Zotero/storage/UTPDRL49/S0049089X2200076X.html} +} + @article{freeseReplicationSocialScience2017, title = {Replication in {{Social Science}}}, author = {Freese, Jeremy and Peterson, David}, @@ -124,13 +545,96 @@ publisher = {Annual Reviews}, issn = {0360-0572, 1545-2115}, doi = {10.1146/annurev-soc-060116-053450}, - url = {https://www.annualreviews.org/content/journals/10.1146/annurev-soc-060116-053450}, + urldate = {2024-12-15}, + abstract = {Across the medical and social sciences, new discussions about replication have led to transformations in research practice. Sociologists, however, have been largely absent from these discussions. The goals of this review are to introduce sociologists to these developments, synthesize insights from science studies about replication in general, and detail the specific issues regarding replication that occur in sociology. The first half of the article argues that a sociologically sophisticated understanding of replication must address both the ways that replication rules and conventions evolved within an epistemic culture and how those cultures are shaped by specific research challenges. The second half outlines the four main dimensions of replicability in quantitative sociology---verifiability, robustness, repeatability, and generalizability---and discusses the specific ambiguities of interpretation that can arise in each. We conclude by advocating some commonsense changes to promote replication while acknowledging the epistemic diversity of our field.}, + langid = {english}, + file = {/home/michaelb/Zotero/storage/FZNRE6US/Freese and Peterson - 2017 - Replication in Social Science.pdf;/home/michaelb/Zotero/storage/MVZMR367/annurev-soc-060116-053450.html} +} + +@article{freeseReplicationSocialScience2017a, + title = {Replication in {{Social Science}}}, + author = {Freese, Jeremy and Peterson, David}, + year = {2017}, + month = jul, + journal = {Annual Review of Sociology}, + volume = {43}, + number = {Volume 43, 2017}, + pages = {147--165}, + publisher = {Annual Reviews}, + issn = {0360-0572, 1545-2115}, + doi = {10.1146/annurev-soc-060116-053450}, urldate = {2024-11-06}, abstract = {Across the medical and social sciences, new discussions about replication have led to transformations in research practice. Sociologists, however, have been largely absent from these discussions. The goals of this review are to introduce sociologists to these developments, synthesize insights from science studies about replication in general, and detail the specific issues regarding replication that occur in sociology. The first half of the article argues that a sociologically sophisticated understanding of replication must address both the ways that replication rules and conventions evolved within an epistemic culture and how those cultures are shaped by specific research challenges. The second half outlines the four main dimensions of replicability in quantitative sociology---verifiability, robustness, repeatability, and generalizability---and discusses the specific ambiguities of interpretation that can arise in each. We conclude by advocating some commonsense changes to promote replication while acknowledging the epistemic diversity of our field.}, langid = {english}, file = {/home/michaelb/Zotero/storage/JEUNVQE5/Freese and Peterson - 2017 - Replication in Social Science.pdf;/home/michaelb/Zotero/storage/62Q3HELK/annurev-soc-060116-053450.html} } +@article{freeseReplicationStandardsQuantitative2007, + title = {Replication {{Standards}} for {{Quantitative Social Science}}: {{Why Not Sociology}}?}, + shorttitle = {Replication {{Standards}} for {{Quantitative Social Science}}}, + author = {Freese, Jeremy}, + year = {2007}, + month = nov, + journal = {Sociological Methods \& Research}, + volume = {36}, + number = {2}, + pages = {153--172}, + publisher = {SAGE Publications Inc}, + issn = {0049-1241}, + doi = {10.1177/0049124107306659}, + urldate = {2024-12-15}, + abstract = {The credibility of quantitative social science benefits from policies that increase confidence that results reported by one researcher can be verified by others. Concerns about replicability have increased as the scale and sophistication of analyses increase the possible dependence of results on subtle analytic decisions and decrease the extent to which published articles contain full descriptions of methods. The author argues that sociology should adopt standards regarding replication that minimize its conceptualization as an ethical and individualistic matter and advocates for a policy in which authors use independent online archives to deposit the maximum possible information for replicating published results at the time of publication and are explicit about the conditions of availability for any necessary materials that are not provided. The author responds to several objections that might be raised to increasing the transparency of quantitative sociology in this way and offers a candidate replication policy for sociology.}, + langid = {english}, + file = {/home/michaelb/Zotero/storage/2HDDFW84/Freese - 2007 - Replication Standards for Quantitative Social Science Why Not Sociology.pdf} +} + +@article{gelmanInductionDeductionBaysian2011, + title = {Induction and {{Deduction}} in {{Baysian Data Analysis}}}, + author = {Gelman, A.}, + year = {2011}, + journal = {Rationality, Markets and Morals}, + urldate = {2024-12-13}, + abstract = {The classical or frequentist approach to statistics (in which inference is centered on significance testing), is associated with a philosophy in which science is deductive and follows Popperis doctrine of falsification. In contrast, Bayesian inference is commonly associated with inductive reasoning and the idea that a model can be dethroned by a competing model but can never be directly falsified by a significance test. The purpose of this article is to break these associations, which I think are incorrect and have been detrimental to statistical practice, in that they have steered falsificationists away from the very useful tools of Bayesian inference and have discouraged Bayesians from checking the fit of their models. From my experience using and developing Bayesian methods in social and environmental science, I have found model checking and falsification to be central in the modeling process.}, + file = {/home/michaelb/Zotero/storage/SRF9DCGD/Gelman - 2011 - Induction and Deduction in Baysian Data Analysis.pdf} +} + +@article{gerberPublicationBiasEmpirical2008, + title = {Publication {{Bias}} in {{Empirical Sociological Research}}: {{Do Arbitrary Significance Levels Distort Published Results}}?}, + shorttitle = {Publication {{Bias}} in {{Empirical Sociological Research}}}, + author = {Gerber, Alan S. and Malhotra, Neil}, + year = {2008}, + month = aug, + journal = {Sociological Methods \& Research}, + volume = {37}, + number = {1}, + pages = {3--30}, + publisher = {SAGE Publications Inc}, + issn = {0049-1241}, + doi = {10.1177/0049124108318973}, + urldate = {2024-12-15}, + abstract = {Despite great attention to the quality of research methods in individual studies, if publication decisions of journals are a function of the statistical significance of research findings, the published literature as a whole may not produce accurate measures of true effects. This article examines the two most prominent sociology journals (the American Sociological Review and the American Journal of Sociology) and another important though less influential journal (The Sociological Quarterly) for evidence of publication bias. The effect of the .05 significance level on the pattern of published findings is examined using a ``caliper'' test, and the hypothesis of no publication bias can be rejected at approximately the 1 in 10 million level. Findings suggest that some of the results reported in leading sociology journals may be misleading and inaccurate due to publication bias. Some reasons for publication bias and proposed reforms to reduce its impact on research are also discussed.}, + langid = {english}, + file = {/home/michaelb/Zotero/storage/S6Y6KRTC/Gerber and Malhotra - 2008 - Publication Bias in Empirical Sociological Research Do Arbitrary Significance Levels Distort Publis.pdf} +} + +@article{goodmanTenSimpleRules2014, + title = {Ten {{Simple Rules}} for the {{Care}} and {{Feeding}} of {{Scientific Data}}}, + author = {Goodman, Alyssa and Pepe, Alberto and Blocker, Alexander W. and Borgman, Christine L. and Cranmer, Kyle and Crosas, Merce and Stefano, Rosanne Di and Gil, Yolanda and Groth, Paul and Hedstrom, Margaret and Hogg, David W. and Kashyap, Vinay and Mahabal, Ashish and Siemiginowska, Aneta and Slavkovic, Aleksandra}, + year = {2014}, + month = apr, + journal = {PLOS Computational Biology}, + volume = {10}, + number = {4}, + pages = {e1003542}, + publisher = {Public Library of Science}, + issn = {1553-7358}, + doi = {10.1371/journal.pcbi.1003542}, + urldate = {2024-12-13}, + langid = {english}, + keywords = {Archives,Computer software,Data management,Data visualization,Metadata,Open source software,Scientists,Software tools}, + file = {/home/michaelb/Zotero/storage/EKZW5LUC/Goodman et al. - 2014 - Ten Simple Rules for the Care and Feeding of Scientific Data.pdf} +} + @article{greenspanOpenSciencePractices2024, title = {Open Science Practices in Criminology and Criminal Justice Journals}, author = {Greenspan, Rachel Leigh and Baggett, Logan and B. Boutwell, Brian}, @@ -139,7 +643,6 @@ journal = {Journal of Experimental Criminology}, issn = {1572-8315}, doi = {10.1007/s11292-024-09640-x}, - url = {https://doi.org/10.1007/s11292-024-09640-x}, urldate = {2024-11-06}, abstract = {Calls for more transparent and replicable scientific practices have been increasing across scientific disciplines over the last decade, often referred to as the open science movement. Open science practices are arguably particularly important in fields like criminology and criminal justice where empirical findings aim to inform public policy and legal practice. Despite favorable views of these practices by criminal justice scholars, limited research has explored how often researchers actually use these open science practices.}, langid = {english}, @@ -147,6 +650,26 @@ file = {/home/michaelb/Zotero/storage/I2BVQP5G/Greenspan et al. - 2024 - Open science practices in criminology and criminal justice journals.pdf} } +@article{hardwickeReducingBiasIncreasing2023, + title = {Reducing Bias, Increasing Transparency and Calibrating Confidence with Preregistration}, + author = {Hardwicke, Tom E. and Wagenmakers, Eric-Jan}, + year = {2023}, + month = jan, + journal = {Nature Human Behaviour}, + volume = {7}, + number = {1}, + pages = {15--26}, + publisher = {Nature Publishing Group}, + issn = {2397-3374}, + doi = {10.1038/s41562-022-01497-2}, + urldate = {2024-12-15}, + abstract = {Flexibility in the design, analysis and interpretation of scientific studies creates a multiplicity of possible research outcomes. Scientists are granted considerable latitude to selectively use and report the hypotheses, variables and analyses that create the most positive, coherent and attractive story while suppressing those that are negative or inconvenient. This creates a risk of bias that can lead to scientists fooling themselves and fooling others. Preregistration involves declaring a research plan (for example, hypotheses, design and statistical analyses) in a public registry before the research outcomes are known. Preregistration (1) reduces the risk of bias by encouraging outcome-independent decision-making and (2) increases transparency, enabling others to assess the risk of bias and calibrate their confidence in research outcomes. In this Perspective, we briefly review the historical evolution of preregistration in medicine, psychology and other domains, clarify its pragmatic functions, discuss relevant meta-research, and provide recommendations for scientists and journal editors.}, + copyright = {2022 Springer Nature Limited}, + langid = {english}, + keywords = {Science,Scientific community,technology and society}, + file = {/home/michaelb/Zotero/storage/W5FLB8LI/Hardwicke and Wagenmakers - 2023 - Reducing bias, increasing transparency and calibrating confidence with preregistration.pdf} +} + @article{havronPreregistrationInfantResearch2020, title = {Preregistration in Infant Research---{{A}} Primer}, author = {Havron, Naomi and Bergmann, Christina and Tsuji, Sho}, @@ -157,7 +680,6 @@ pages = {734--754}, issn = {1532-7078}, doi = {10.1111/infa.12353}, - url = {https://onlinelibrary.wiley.com/doi/abs/10.1111/infa.12353}, urldate = {2024-11-06}, abstract = {Preregistration, the act of specifying a research plan in advance, is becoming more common in scientific research. Infant researchers contend with unique problems that might make preregistration particularly challenging. Infants are a hard-to-reach population, usually yielding small sample sizes, they can only complete a limited number of trials, and they can be excluded based on hard-to-predict complications (e.g., parental interference, fussiness). In addition, as effects themselves potentially change with age and population, it is hard to calculate an a priori effect size. At the same time, these very factors make preregistration in infant studies a valuable tool. A priori examination of the planned study, including the hypotheses, sample size, and resulting statistical power, increases the credibility of single studies and adds value to the field. Preregistration might also improve explicit decision making to create better studies. We present an in-depth discussion of the issues uniquely relevant to infant researchers, and ways to contend with them in preregistration and study planning. We provide recommendations to researchers interested in following current best practices.}, copyright = {{\copyright} 2020 International Congress of Infant Studies (ICIS)}, @@ -165,6 +687,36 @@ file = {/home/michaelb/Zotero/storage/7MTAJ6I2/Havron et al. - 2020 - Preregistration in infant research—A primer.pdf;/home/michaelb/Zotero/storage/DF3KLSUF/infa.html} } +@article{jandotInteractiveSemanticFeaturing2016, + title = {Interactive {{Semantic Featuring}} for {{Text Classification}}}, + author = {Jandot, Camille and Simard, Patrice Y. and Chickering, D. M. and Grangier, David and Suh, Jina}, + year = {2016}, + month = jun, + journal = {ArXiv}, + urldate = {2024-12-16}, + abstract = {In text classification, dictionaries can be used to define human-comprehensible features. We propose an improvement to dictionary features called smoothed dictionary features. These features recognize document contexts instead of n-grams. We describe a principled methodology to solicit dictionary features from a teacher, and present results showing that models built using these human-comprehensible features are competitive with models trained with Bag of Words features.}, + file = {/home/michaelb/Zotero/storage/UVTS96I8/Jandot et al. - 2016 - Interactive Semantic Featuring for Text Classification.pdf} +} + +@article{jarolimkovaDataSharingIntegral2023, + title = {Data Sharing: An Integral Part of Research Practice?}, + shorttitle = {Data Sharing}, + author = {Jarolimkova, Adela}, + year = {2023}, + month = dec, + journal = {Qualitative and Quantitative Methods in Libraries}, + volume = {12}, + number = {4}, + pages = {609--620}, + issn = {2241-1925}, + urldate = {2024-12-15}, + abstract = {Sharing research data is now recognised as an integral part of scientific work and as a service to the public, contributing to the development of knowledge and the transparency of research. However, as many studies have shown, data sharing policies and practices vary widely across disciplines, countries; and funding bodies, and ultimately depend on the motivation and attitudes of individual researchers. The author focuses on researchers' attitudes to data sharing, drawing on an extensive literature review of data sharing studies. The author describes the factors that influence researchers' data sharing at an individual level, and the motivations and barriers that prevent effective access to data.}, + copyright = {Copyright (c) 2023 Qualitative and Quantitative Methods in Libraries}, + langid = {english}, + keywords = {attitudes,barriers,data sharing,motivation}, + file = {/home/michaelb/Zotero/storage/XYDAMD7M/Jarolimkova - 2023 - Data sharing an integral part of research practice.pdf} +} + @article{johnsonPreregistrationSingleCaseDesign2019, title = {Preregistration in {{Single-Case Design Research}}}, author = {Johnson, Austin H. and Cook, Bryan G.}, @@ -177,13 +729,48 @@ publisher = {SAGE Publications Inc}, issn = {0014-4029}, doi = {10.1177/0014402919868529}, - url = {https://doi.org/10.1177/0014402919868529}, urldate = {2024-11-06}, abstract = {To draw informed conclusions from research studies, research consumers need full and accurate descriptions of study methods and procedures. Preregistration has been proposed as a means to clarify reporting of research methods and procedures, with the goal of reducing bias in research. However, preregistration has been applied primarily to research studies utilizing group designs. In this article, we discuss general issues in preregistration and consider the use of preregistration in single-case design research, particularly as it relates to differing applications of this methodology. We then provide a rationale and make specific recommendations for preregistering single-case design research, including guidelines for preregistering basic descriptive information, research questions, participant characteristics, baseline conditions, independent and dependent variables, hypotheses, and phase-change decisions.}, langid = {english}, file = {/home/michaelb/Zotero/storage/Z34LN54E/Johnson and Cook - 2019 - Preregistration in Single-Case Design Research.pdf} } +@article{kimResearchPaperClassification2019, + title = {Research Paper Classification Systems Based on {{TF-IDF}} and {{LDA}} Schemes}, + author = {Kim, Sang-Woon and Gil, Joon-Min}, + year = {2019}, + month = aug, + journal = {Human-centric Computing and Information Sciences}, + volume = {9}, + number = {1}, + pages = {30}, + issn = {2192-1962}, + doi = {10.1186/s13673-019-0192-7}, + urldate = {2024-12-16}, + abstract = {With the increasing advance of computer and information technologies, numerous research papers have been published online as well as offline, and as new research fields have been continuingly created, users have a lot of trouble in finding and categorizing their interesting research papers. In order to overcome the limitations, this paper proposes a research paper classification system that can cluster research papers into the meaningful class in which papers are very likely to have similar subjects. The proposed system extracts representative keywords from the abstracts of each paper and topics by Latent Dirichlet allocation (LDA) scheme. Then, the K-means clustering algorithm is applied to classify the whole papers into research papers with similar subjects, based on the Term frequency-inverse document frequency (TF-IDF) values of each paper.}, + langid = {english}, + keywords = {Artificial Intelligence,K-means clustering,LDA,Paper classification,TF-IDF}, + file = {/home/michaelb/Zotero/storage/23YFBPYR/Kim and Gil - 2019 - Research paper classification systems based on TF-IDF and LDA schemes.pdf} +} + +@article{kraussDebunkingRevolutionaryParadigm2024, + title = {Debunking Revolutionary Paradigm Shifts: Evidence of Cumulative Scientific Progress across Science}, + shorttitle = {Debunking Revolutionary Paradigm Shifts}, + author = {Krauss, Alexander}, + year = {2024}, + month = nov, + journal = {Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences}, + volume = {480}, + number = {2302}, + pages = {20240141}, + publisher = {Royal Society}, + doi = {10.1098/rspa.2024.0141}, + urldate = {2024-12-13}, + abstract = {How can scientific progress be conceived best? Does science mainly undergo revolutionary paradigm shifts? Or is the evolution of science mainly cumulative? Understanding whether science advances through cumulative evolution or through paradigm shifts can influence how we approach scientific research, education and policy. The most influential and cited account of science was put forth in Thomas Kuhn's seminal book The structure of scientific revolutions. Kuhn argues that science does not advance cumulatively but goes through fundamental paradigm changes in the theories of a scientific field. There is no consensus yet on this core question of the nature and advancement of science that has since been debated across science. Examining over 750 major scientific discoveries (all Nobel Prize and major non-Nobel Prize discoveries), we systematically test this fundamental question about scientific progress here. We find that three key measures of scientific progress---major discoveries, methods and fields---each demonstrate that science evolves cumulatively. First, we show that no major scientific methods or instruments used across fields (such as statistical methods, X-ray methods or chromatography) have been completely abandoned, i.e. subject to paradigm shifts. Second, no major scientific fields (such as biomedicine, chemistry or computer science) have been completely abandoned. Rather, they have all continuously expanded over time, often over centuries, accumulating extensive bodies of knowledge. Third, scientific discoveries including theoretical discoveries are also predominately cumulative, with only 1\% of over 750 major discoveries having been abandoned. The continuity of science is most compellingly evidenced by our methods and instruments, which enable the creation of discoveries and fields. We thus offer here a new perspective and answer to this classic question in science and the philosophy and history of science by utilizing methods from statistics and empirical sciences.}, + keywords = {discovery,paradigm change,paradigm shift,scientific discovery,scientific progress,structure of scientific revolutions}, + file = {/home/michaelb/Zotero/storage/DQLA2ER2/Krauss - 2024 - Debunking revolutionary paradigm shifts evidence of cumulative scientific progress across science.pdf} +} + @article{kuhbergerPublicationBiasPsychology2014, title = {Publication {{Bias}} in {{Psychology}}: {{A Diagnosis Based}} on the {{Correlation}} between {{Effect Size}} and {{Sample Size}}}, shorttitle = {Publication {{Bias}} in {{Psychology}}}, @@ -197,7 +784,6 @@ publisher = {Public Library of Science}, issn = {1932-6203}, doi = {10.1371/journal.pone.0105825}, - url = {https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0105825}, urldate = {2024-11-06}, abstract = {Background The p value obtained from a significance test provides no information about the magnitude or importance of the underlying phenomenon. Therefore, additional reporting of effect size is often recommended. Effect sizes are theoretically independent from sample size. Yet this may not hold true empirically: non-independence could indicate publication bias. Methods We investigate whether effect size is independent from sample size in psychological research. We randomly sampled 1,000 psychological articles from all areas of psychological research. We extracted p values, effect sizes, and sample sizes of all empirical papers, and calculated the correlation between effect size and sample size, and investigated the distribution of p values. Results We found a negative correlation of r = -.45 [95\% CI: -.53; -.35] between effect size and sample size. In addition, we found an inordinately high number of p values just passing the boundary of significance. Additional data showed that neither implicit nor explicit power analysis could account for this pattern of findings. Conclusion The negative correlation between effect size and samples size, and the biased distribution of p values indicate pervasive publication bias in the entire field of psychology.}, langid = {english}, @@ -205,6 +791,91 @@ file = {/home/michaelb/Zotero/storage/SHQZWBDE/Kühberger et al. - 2014 - Publication Bias in Psychology A Diagnosis Based on the Correlation between Effect Size and Sample.pdf} } +@incollection{kuhnReflectionsMyCritics1970, + title = {Reflections on My {{Critics}}}, + booktitle = {Criticism and the {{Growth}} of {{Knowledge}}: {{Proceedings}} of the {{International Colloquium}} in the {{Philosophy}} of {{Science}}, {{London}}, 1965}, + author = {Kuhn, T. S.}, + editor = {Musgrave, Alan and Lakatos, Imre}, + year = {1970}, + volume = {4}, + pages = {231--278}, + publisher = {Cambridge University Press}, + address = {Cambridge}, + doi = {10.1017/CBO9781139171434.011}, + urldate = {2024-12-13}, + abstract = {1. Introduction.2. Methodology: the role of history and sociology.3. Normal Science: its nature and functions.4. Normal Science: its retrieval from history.5. Irrationality and Theory-Choice.6. Incommensurability and Paradigms.INTRODUCTIONIt is now four years since Professor Watkins and I exchanged mutually impenetrable views at the International Colloquium in the Philosophy of Science held at Bedford College, London. Rereading our contributions together with those that have since accreted to them, I am tempted to posit the existence of two Thomas Kuhns. Kuhn is the author of this essay and of an earlier piece in this volume. He also published in 1962 a book called The Structure of Scientific Revolutions, the one which he and Miss Master-man discuss above. Kuhn is the author of another book with the same title. It is the one here cited repeatedly by Sir Karl Popper as well as by Professors Feyerabend, Lakatos, Toulmin, and Watkins. That both books bear the same title cannot be altogether accidental, for the views they present often overlap and are, in any case, expressed in the same words. But their central concerns are, I conclude, usually very different. As reported by his critics (his original has unfortunately been unavailable to me), Kuhn seems on occasion to make points that subvert essential aspects of the position outlined by his namesake.Lacking the wit to extend this introductory fantasy, I will instead explain why I have embarked upon it.}, + isbn = {978-0-521-09623-2}, + file = {/home/michaelb/Zotero/storage/WM6P6A3L/Kuhn - 1970 - Reflections on my Critics.pdf;/home/michaelb/Zotero/storage/MDA7UI6R/7AC72C71EC97FCBB6AEFED1B78F0775B.html} +} + +@book{kuhnStructureScientificRevolutions1962, + title = {The Structure of Scientific Revolutions}, + author = {Kuhn, T. S.}, + year = {1962}, + series = {The Structure of Scientific Revolutions}, + publisher = {Chicago}, + address = {University of Chicago Press}, + abstract = {This modern classic on the philosophy of science examines the nature of scientific progress. Progress is seen as accumulative only when certain values and goals are shared; when this set of values (a paradigm) breaks down, science is seen as entering a revolutionary phase. Harvard Book List (edited) 1971 \#37 (PsycINFO Database Record (c) 2018 APA, all rights reserved)}, + file = {/home/michaelb/Zotero/storage/G4SDNWXQ/1962-35001-000.html} +} + +@misc{kuiperHowCriminologyAffects2023, + type = {{{SSRN Scholarly Paper}}}, + title = {How {{Criminology Affects Punishment}}: {{Analyzing Conditions Under Which Scientific Information Affects Sanction Policy Decisions}}}, + shorttitle = {How {{Criminology Affects Punishment}}}, + author = {Kuiper, Malouke Esra and Reinders Folmer, Chris and Kooistra, Emmeke Barbara and Pogarsky, Greg and {van Rooij}, Benjamin}, + year = {2023}, + month = oct, + number = {4605853}, + eprint = {4605853}, + publisher = {Social Science Research Network}, + address = {Rochester, NY}, + doi = {10.2139/ssrn.4605853}, + urldate = {2024-11-06}, + abstract = {Criminology has a strong potential to impact criminal justice policy. It is thought that criminology fails to shape policy because of the political context of such policies. The present study analyses, however, whether criminological knowledge has the capacity to shape policy decision making in the absence of an explicit political context. We do so through a vignette study (N = 212) comparing how participants make criminal sanction policy decisions with or without reading criminological findings about the deterrent effect of longer sentences and whether this can be influenced by making harm to victims salient. The study finds that criminological science can impact policy decision making outside an explicit political context, also with salient harm to victims. Our findings show that when there is no explicit political context present, criminological evidence does affect policy making, even when there is a countervailing factor such as victim salience. This shows that the science in of itself need not be the obstacle to better alignment with policy. The study offers a new research agenda to further generalize these results and to work towards a better incorporation of criminology in criminal justice policy.}, + archiveprefix = {Social Science Research Network}, + langid = {english}, + keywords = {criminal justice policy,criminological knowledge,decision-making,deterrence,policy makers,punishment}, + file = {/home/michaelb/Zotero/storage/YCU9T7S3/Kuiper et al. - 2023 - How Criminology Affects Punishment Analyzing Conditions Under Which Scientific Information Affects.pdf} +} + +@article{lawrenceFreeOnlineAvailability2001, + title = {Free Online Availability Substantially Increases a Paper's Impact}, + author = {Lawrence, Steve}, + year = {2001}, + month = may, + journal = {Nature}, + volume = {411}, + number = {6837}, + pages = {521--521}, + publisher = {Nature Publishing Group}, + issn = {1476-4687}, + doi = {10.1038/35079151}, + urldate = {2024-12-13}, + copyright = {2001 Springer Nature Limited}, + langid = {english}, + keywords = {Humanities and Social Sciences,multidisciplinary,Science}, + file = {/home/michaelb/Zotero/storage/YV4RXEEH/Lawrence - 2001 - Free online availability substantially increases a paper's impact.pdf} +} + +@article{leggettLifeJustSignificant2013, + title = {The Life of p: "Just Significant" Results Are on the Rise}, + shorttitle = {The Life of p}, + author = {Leggett, Nathan C. and Thomas, Nicole A. and Loetscher, Tobias and Nicholls, Michael E. R.}, + year = {2013}, + journal = {Quarterly Journal of Experimental Psychology (2006)}, + volume = {66}, + number = {12}, + pages = {2303--2309}, + issn = {1747-0226}, + doi = {10.1080/17470218.2013.863371}, + abstract = {Null hypothesis significance testing uses the seemingly arbitrary probability of .05 as a means of objectively determining whether a tested effect is reliable. Within recent psychological articles, research has found an overrepresentation of p values around this cut-off. The present study examined whether this overrepresentation is a product of recent pressure to publish or whether it has existed throughout psychological research. Articles published in 1965 and 2005 from two prominent psychology journals were examined. Like previous research, the frequency of p values at and just below .05 was greater than expected compared to p frequencies in other ranges. While this overrepresentation was found for values published in both 1965 and 2005, it was much greater in 2005. Additionally, p values close to but over .05 were more likely to be rounded down to, or incorrectly reported as, significant in 2005 than in 1965. Modern statistical software and an increased pressure to publish may explain this pattern. The problem may be alleviated by reduced reliance on p values and increased reporting of confidence intervals and effect sizes.}, + langid = {english}, + pmid = {24205936}, + keywords = {Databases Bibliographic,Humans,Periodicals as Topic,Psychology,Publication Bias,Statistics as Topic}, + file = {/home/michaelb/Zotero/storage/8IRZ9MUW/Leggett et al. - 2013 - The life of p just significant results are on the rise.pdf} +} + @article{loggPreregistrationWeighingCosts2021, title = {Pre-Registration: {{Weighing}} Costs and Benefits for Researchers}, shorttitle = {Pre-Registration}, @@ -216,7 +887,6 @@ pages = {18--27}, issn = {0749-5978}, doi = {10.1016/j.obhdp.2021.05.006}, - url = {https://www.sciencedirect.com/science/article/pii/S0749597821000649}, urldate = {2024-11-06}, abstract = {In the past decade, the social and behavioral sciences underwent a methodological revolution, offering practical prescriptions for improving the replicability and reproducibility of research results. One key to reforming science is a simple and scalable practice: pre-registration. Pre-registration constitutes pre-specifying an analysis plan prior to data collection. A growing chorus of articles discusses the prescriptive, field-wide benefits of pre-registration. To increase adoption, however, scientists need to know who currently pre-registers and understand perceived barriers to doing so. Thus, we weigh costs and benefits of pre-registration. Our survey of researchers reveals generational differences in who pre-registers and uncertainty regarding how pre-registration benefits individual researchers. We leverage these data to directly address researchers' uncertainty by clarifying why pre-registration improves the research process itself. Finally, we discuss how to pre-register and compare available resources. The present work examines the who, why, and how of pre-registration in order to weigh the costs and benefits of pre-registration to researchers and motivate continued adoption.}, keywords = {Methodology,Open science,Pre-registration,Replication}, @@ -235,7 +905,6 @@ pages = {193--210}, issn = {1936-4784}, doi = {10.1007/s12108-023-09563-6}, - url = {https://doi.org/10.1007/s12108-023-09563-6}, urldate = {2024-11-06}, abstract = {Both within and outside of sociology, there are conversations about methods to reduce error and improve research quality---one such method is preregistration and its counterpart, registered reports. Preregistration is the process of detailing research questions, variables, analysis plans, etc. before conducting research. Registered reports take this one step further, with a paper being reviewed on the merit of these plans, not its findings. In this manuscript, I detail preregistration's and registered reports' strengths and weaknesses for improving the quality of sociological research. I conclude by considering the implications of a structural-level adoption of preregistration and registered reports. Importantly, I do not recommend that all sociologists use preregistration and registered reports for all studies. Rather, I discuss the potential benefits and genuine limitations of preregistration and registered reports for the individual sociologist and the discipline.}, langid = {english}, @@ -254,12 +923,29 @@ pages = {739--763}, issn = {0021-9916}, doi = {10.1093/joc/jqab030}, - url = {https://doi.org/10.1093/joc/jqab030}, urldate = {2024-11-06}, abstract = {A significant paradigm shift is underway in communication research as open science practices (e.g., preregistration, open materials) are becoming more prevalent. The current work identified how much the field has embraced such practices and evaluated their impact on authors (e.g., citation rates). We collected 10,517 papers across 26 journals from 2010 to 2020, observing that 5.1\% of papers used or mentioned open science practices. Communication research has seen the rate of nonsignificant p-values (p \> .055) increasing with the adoption of open science over time, but p-values just below p \< .05 have not reduced with open science adoption. Open science adoption was unrelated to citation rate at the article level; however, it was inversely related to the journals' h-index. Our results suggest communication organizations and scholars have important work ahead to make open science more mainstream. We close with suggestions to increase open science adoption for the field at large.}, file = {/home/michaelb/Zotero/storage/WBKICQTZ/Markowitz et al. - 2021 - Tracing the Adoption and Effects of Open Science in Communication Research.pdf;/home/michaelb/Zotero/storage/KV8S4HXI/6354844.html} } +@article{matternWhyAcademicsUndershare2024, + title = {Why Academics Under-Share Research Data: {{A}} Social Relational Theory}, + shorttitle = {Why Academics Under-Share Research Data}, + author = {Mattern, Janice Bially and Kohlburn, Joseph and {Moulaison-Sandy}, Heather}, + year = {2024}, + journal = {Journal of the Association for Information Science and Technology}, + volume = {75}, + number = {9}, + pages = {988--1001}, + issn = {2330-1643}, + doi = {10.1002/asi.24938}, + urldate = {2024-12-15}, + abstract = {Despite their professed enthusiasm for open science, faculty researchers have been documented as not freely sharing their data; instead, if sharing data at all, they take a minimal approach. A robust research agenda in LIS has documented the data under-sharing practices in which they engage, and the motivations they profess. Using theoretical frameworks from sociology to complement research in LIS, this article examines the broader context in which researchers are situated, theorizing the social relational dynamics in academia that influence faculty decisions and practices relating to data sharing. We advance a theory that suggests that the academy has entered a period of transition, and faculty resistance to data sharing through foot-dragging is one response to shifting power dynamics. If the theory is borne out empirically, proponents of open access will need to find a way to encourage open academic research practices without undermining the social value of academic researchers.}, + copyright = {{\copyright} 2024 The Author(s). Journal of the Association for Information Science and Technology published by Wiley Periodicals LLC on behalf of Association for Information Science and Technology.}, + langid = {english}, + file = {/home/michaelb/Zotero/storage/63BNN3ZN/Mattern et al. - 2024 - Why academics under-share research data A social relational theory.pdf;/home/michaelb/Zotero/storage/VKRAEZB8/asi.html} +} + @article{mertensPreregistrationAnalysesPreexisting2019, title = {Preregistration of {{Analyses}} of {{Preexisting Data}}}, author = {Mertens, Ga{\"e}tan and Krypotos, Angelos-Miltiadis}, @@ -270,13 +956,92 @@ number = {1}, issn = {0033-2879}, doi = {10.5334/pb.493}, - url = {https://psychologicabelgica.com/articles/10.5334/pb.493}, urldate = {2024-11-06}, abstract = {Psychologica Belgica is the official journal of the Belgian Association for Psychological Sciences (BAPS). BAPS promotes the development of psychological sciences in Belgium, at both fundamental and applied research levels. The journal ensures rigorous peer-review to maintain research integrity. Psychological Belgica makes publications available online as soon as they are finalised. All publications are open access, making research available free of charge and without delay. The journal has a 2022 Impact Factor of 2.0 and a 5 year impact factor of 2.1. Subscribe to content alerts and other journal news here. You can also follow the journal on ResearchGate.}, langid = {american}, file = {/home/michaelb/Zotero/storage/GZRSX45A/Mertens and Krypotos - 2019 - Preregistration of Analyses of Preexisting Data.pdf} } +@article{nororiAddressingBiasBig2021, + title = {Addressing Bias in Big Data and {{AI}} for Health Care: {{A}} Call for Open Science}, + shorttitle = {Addressing Bias in Big Data and {{AI}} for Health Care}, + author = {Norori, Natalia and Hu, Qiyang and Aellen, Florence Marcelle and Faraci, Francesca Dalia and Tzovara, Athina}, + year = {2021}, + month = oct, + journal = {Patterns}, + volume = {2}, + number = {10}, + pages = {100347}, + issn = {2666-3899}, + doi = {10.1016/j.patter.2021.100347}, + urldate = {2024-12-13}, + abstract = {Artificial intelligence (AI) has an astonishing potential in assisting clinical decision making and revolutionizing the field of health care. A major open challenge that AI will need to address before its integration in the clinical routine is that of algorithmic bias. Most AI algorithms need big datasets to learn from, but several groups of the human population have a long history of being absent or misrepresented in existing biomedical datasets. If the training data is misrepresentative of the population variability, AI is prone to reinforcing bias, which can lead to fatal outcomes, misdiagnoses, and lack of generalization. Here, we describe the challenges in rendering AI algorithms fairer, and we propose concrete steps for addressing bias using tools from the field of open science.}, + keywords = {artificial intelligence,bias,data standards,deep learning,health care,open science,participatory science}, + file = {/home/michaelb/Zotero/storage/38I8DX6G/Norori et al. - 2021 - Addressing bias in big data and AI for health care A call for open science.pdf;/home/michaelb/Zotero/storage/KHTDQ2SW/S2666389921002026.html} +} + +@article{nosekRegisteredReports2014, + title = {Registered {{Reports}}}, + author = {Nosek, Brian A. and Lakens, Dani{\"e}l}, + year = {2014}, + month = may, + journal = {Social Psychology}, + volume = {45}, + number = {3}, + pages = {137--141}, + publisher = {Hogrefe Publishing}, + issn = {1864-9335}, + doi = {10.1027/1864-9335/a000192}, + urldate = {2024-12-16}, + file = {/home/michaelb/Zotero/storage/DYGVTARA/Nosek and Lakens - 2014 - Registered Reports.pdf} +} + +@misc{omalleyLinuxMars2021, + title = {Linux on {{Mars}}!}, + author = {O'Malley, James}, + year = {2021}, + month = aug, + journal = {ITPro}, + urldate = {2024-03-11}, + abstract = {Open-source software on the Perseverance mission is helping NASA explore a GNU world}, + howpublished = {https://www.itpro.com/software/linux/360542/linux-on-mars}, + langid = {english}, + file = {/home/michaelb/Zotero/storage/P97IG6R9/linux-on-mars.html} +} + +@article{oparindeKeyDevelopmentsGlobal2024, + title = {Key Developments in Global Scholarly Publishing: {{Negotiating}} a Double-Edged Sword}, + shorttitle = {Key Developments in Global Scholarly Publishing}, + author = {Oparinde, Kunle and Govender, Vaneshree and Adedokun, Theophilus and Agbede, Grace Temiloluwa and Thungo, Sithabile}, + year = {2024}, + journal = {Learned Publishing}, + volume = {37}, + number = {3}, + pages = {e1604}, + issn = {1741-4857}, + doi = {10.1002/leap.1604}, + urldate = {2024-12-11}, + abstract = {Over the last few years, the publishing industry has experienced significant changes and developments, most of which have had a positive influence on scholarly publishing. For instance, the gradual popularity of open access publishing has contributed to the wider access and readership of published materials. Also, the recent development in the abilities of artificial intelligence (AI) tools to assist in the publication process is laudable for its potential. The gradual shift from print to online publication is also a commendable development in global publishing. Not without their own challenges, these developments, among others, have mostly impacted global publishing in a positive way. In the current study, the researchers' argument stems from the notion that although these developments are invaluable, there are accompanying impediments that publishing professionals as well as publishing outlets must consider. In response to these developments, role-players in the publishing industry must constantly reassess their publishing processes in order to carefully manage and negotiate what is termed by this study as a `double-edged sword' (capable of having positive and negative consequences). This study reviews existing studies, draws views from publishing experts, and seeks opinions from scholars to establish methods of negotiating some of the key developments in global publishing.}, + copyright = {{\copyright} 2024 The Authors. Learned Publishing published by John Wiley \& Sons Ltd on behalf of ALPSP.}, + langid = {english}, + keywords = {publishing professionals,scholarly publishing,transformation}, + file = {/home/michaelb/Zotero/storage/4HXZJ9T6/Oparinde et al. - 2024 - Key developments in global scholarly publishing Negotiating a double-edged sword.pdf;/home/michaelb/Zotero/storage/F47W47AW/leap.html} +} + +@book{popperLogicScientificDiscovery2005, + title = {The {{Logic}} of {{Scientific Discovery}}}, + author = {Popper, Karl}, + year = {2005}, + month = nov, + edition = {2}, + publisher = {Routledge}, + address = {London}, + doi = {10.4324/9780203994627}, + abstract = {Described by the philosopher A.J. Ayer as a work of 'great originality and power', this book revolutionized contemporary thinking on science and knowledge. Ideas such as~the now legendary doctrine of 'falsificationism' electrified the scientific community, influencing even working scientists, as well as post-war philosophy. This astonishing work ranks alongside The Open Society and Its Enemies as one of Popper's most enduring books and contains insights and arguments that demand to be read to this day.}, + isbn = {978-0-203-99462-7}, + file = {/home/michaelb/Zotero/storage/ETAI2LMN/Popper - 2005 - The Logic of Scientific Discovery.pdf} +} + @article{pridemoreReplicationCriminologySocial2018, title = {Replication in {{Criminology}} and the {{Social Sciences}}}, author = {Pridemore, William Alex and Makel, Matthew C. and Plucker, Jonathan A.}, @@ -289,7 +1054,6 @@ publisher = {Annual Reviews}, issn = {2572-4568}, doi = {10.1146/annurev-criminol-032317-091849}, - url = {https://www.annualreviews.org/content/journals/10.1146/annurev-criminol-032317-091849}, urldate = {2024-11-06}, abstract = {Replication is a hallmark of science. In recent years, some medical sciences and behavioral sciences struggled with what came to be known as replication crises. As a field, criminology has yet to address formally the threats to our evidence base that might be posed by large-scale and systematic replication attempts, although it is likely we would face challenges similar to those experienced by other disciplines. In this review, we outline the basics of replication, summarize reproducibility problems found in other fields, undertake an original analysis of the amount and nature of replication studies appearing in criminology journals, and consider how criminology can begin to assess more formally the robustness of our knowledge through encouraging a culture of replication.}, langid = {english}, @@ -313,6 +1077,37 @@ file = {/home/michaelb/Zotero/storage/3Z39JCGR/doiLanding.html} } +@article{rowbottomKuhnVsPopper2011, + title = {Kuhn vs. {{Popper}} on Criticism and Dogmatism in Science: A Resolution at the Group Level}, + shorttitle = {Kuhn vs. {{Popper}} on Criticism and Dogmatism in Science}, + author = {Rowbottom, Darrell P.}, + year = {2011}, + month = mar, + journal = {Studies in History and Philosophy of Science Part A}, + volume = {42}, + number = {1}, + pages = {117--124}, + issn = {0039-3681}, + doi = {10.1016/j.shpsa.2010.11.031}, + urldate = {2024-12-13}, + abstract = {Popper repeatedly emphasised the significance of a critical attitude, and a related critical method, for scientists. Kuhn, however, thought that unquestioning adherence to the theories of the day is proper; at least for `normal scientists'. In short, the former thought that dominant theories should be attacked, whereas the latter thought that they should be developed and defended (for the vast majority of the time). Both seem to have missed a trick, however, due to their apparent insistence that each individual scientist should fulfil similar functions (at any given point in time). The trick is to consider science at the group level; and doing so shows how puzzle solving and `offensive' critical activity can simultaneously have a legitimate place in science. This analysis shifts the focus of the debate. The crucial question becomes `How should the balance between functions be struck?'}, + file = {/home/michaelb/Zotero/storage/KFYID48X/Rowbottom - 2011 - Kuhn vs. Popper on criticism and dogmatism in science a resolution at the group level.pdf;/home/michaelb/Zotero/storage/PUFCMHFR/S003936811000110X.html} +} + +@inproceedings{sanguansatFeatureMatricizationDocument2012, + title = {Feature Matricization for Document Classification}, + booktitle = {2012 {{IEEE International Conference}} on {{Signal Processing}}, {{Communication}} and {{Computing}} ({{ICSPCC}} 2012)}, + author = {Sanguansat, Parinya}, + year = {2012}, + month = aug, + pages = {745--749}, + doi = {10.1109/ICSPCC.2012.6335622}, + urldate = {2024-12-16}, + abstract = {Generally, the dimension of feature vector in text classification depends on the number of words in the specific domain. Many documents of considered categories make it numerous. Therefore, the dimension of feature vector is very high that makes it consumes a lot of time and memory to process. Moreover, it is a cause of the small sample size problem when the number of available training documents is far smaller than the dimension of these feature vectors. This paper proposes the alternative technique of dimensionality reduction for the feature vector in two-dimensional manner by previously transforming the feature vector to the feature matrix and then using Two-Dimensional Principal Component Analysis (2DPCA) for reducing the dimension of this feature matrix. Based on 2DPCA, the original weighted term matrix is not necessary to store in the memory anymore because the scatter matrix of 2DPCA can be computed incrementally. The small reduction in matrix form impacts to the plenty of dimensionality reduction in vector form. From the experimental results on well-known dataset, the proposed method not only significantly reduce the dimensionality but also achieve the higher accuracy rate than the original feature space.}, + keywords = {Accuracy,Covariance matrix,Document classification,Feature extraction,Machine learning,Matricization,Principal component analysis,Support vector machines,Vectors}, + file = {/home/michaelb/Zotero/storage/E4GPLWP6/Sanguansat - 2012 - Feature matricization for document classification.pdf;/home/michaelb/Zotero/storage/MY7DF92Q/6335622.html} +} + @article{sarafoglouSurveyHowPreregistration2022, title = {A Survey on How Preregistration Affects the Research Workflow: Better Science but More Work}, shorttitle = {A Survey on How Preregistration Affects the Research Workflow}, @@ -325,7 +1120,6 @@ pages = {211997}, publisher = {Royal Society}, doi = {10.1098/rsos.211997}, - url = {https://royalsocietypublishing.org/doi/10.1098/rsos.211997}, urldate = {2024-11-06}, abstract = {The preregistration of research protocols and analysis plans is a main reform innovation to counteract confirmation bias in the social and behavioural sciences. While theoretical reasons to preregister are frequently discussed in the literature, the individually experienced advantages and disadvantages of this method remain largely unexplored. The goal of this exploratory study was to identify the perceived benefits and challenges of preregistration from the researcher's perspective. To this end, we surveyed 355 researchers, 299 of whom had used preregistration in their own work. The researchers indicated the experienced or expected effects of preregistration on their workflow. The results show that experiences and expectations are mostly positive. Researchers in our sample believe that implementing preregistration improves or is likely to improve the quality of their projects. Criticism of preregistration is primarily related to the increase in work-related stress and the overall duration of the project. While the benefits outweighed the challenges for the majority of researchers with preregistration experience, this was not the case for the majority of researchers without preregistration experience. The experienced advantages and disadvantages identified in our survey could inform future efforts to improve preregistration and thus help the methodology gain greater acceptance in the scientific community.}, keywords = {meta-science,open science,replication crisis}, @@ -345,7 +1139,6 @@ publisher = {SAGE Publications Inc}, issn = {1043-9862}, doi = {10.1177/1043986218777288}, - url = {https://doi.org/10.1177/1043986218777288}, urldate = {2024-11-06}, langid = {english}, file = {/home/michaelb/Zotero/storage/JEEQLPSS/Savolainen and VanEseltine - 2018 - Replication and Research Integrity in Criminology Introduction to the Special Issue.pdf} @@ -363,13 +1156,51 @@ pages = {240313}, publisher = {Royal Society}, doi = {10.1098/rsos.240313}, - url = {https://royalsocietypublishing.org/doi/10.1098/rsos.240313}, urldate = {2024-11-16}, abstract = {The scientific method is predicated on transparency---yet the pace at which transparent research practices are being adopted by the scientific community is slow. The replication crisis in psychology showed that published findings employing statistical inference are threatened by undetected errors, data manipulation and data falsification. To mitigate these problems and bolster research credibility, open data and preregistration practices have gained traction in the natural and social sciences. However, the extent of their adoption in different disciplines is unknown. We introduce computational procedures to identify the transparency of a research field using large-scale text analysis and machine learning classifiers. Using political science and international relations as an illustrative case, we examine 93 931 articles across the top 160 political science and international relations journals between 2010 and 2021. We find that approximately 21\% of all statistical inference papers have open data and 5\% of all experiments are preregistered. Despite this shortfall, the example of leading journals in the field shows that change is feasible and can be effected quickly.}, keywords = {data sharing,journal policy,open science,preregistration}, file = {/home/michaelb/Zotero/storage/LZVK24S3/Scoggins and Robertson - 2024 - Measuring transparency in the social sciences political science and international relations.pdf} } +@article{smaldinoOpenScienceModified2019, + title = {Open Science and Modified Funding Lotteries Can Impede the Natural Selection of Bad Science}, + author = {Smaldino, Paul E. and Turner, Matthew A. and Contreras Kallens, Pablo A.}, + year = {2019}, + month = jul, + journal = {Royal Society Open Science}, + volume = {6}, + number = {7}, + pages = {190194}, + publisher = {Royal Society}, + doi = {10.1098/rsos.190194}, + urldate = {2024-12-13}, + abstract = {Assessing scientists using exploitable metrics can lead to the degradation of research methods even without any strategic behaviour on the part of individuals, via `the natural selection of bad science.' Institutional incentives to maximize metrics like publication quantity and impact drive this dynamic. Removing these incentives is necessary, but institutional change is slow. However, recent developments suggest possible solutions with more rapid onsets. These include what we call open science improvements, which can reduce publication bias and improve the efficacy of peer review. In addition, there have been increasing calls for funders to move away from prestige- or innovation-based approaches in favour of lotteries. We investigated whether such changes are likely to improve the reproducibility of science even in the presence of persistent incentives for publication quantity through computational modelling. We found that modified lotteries, which allocate funding randomly among proposals that pass a threshold for methodological rigour, effectively reduce the rate of false discoveries, particularly when paired with open science improvements that increase the publication of negative results and improve the quality of peer review. In the absence of funding that targets rigour, open science improvements can still reduce false discoveries in the published literature but are less likely to improve the overall culture of research practices that underlie those publications.}, + keywords = {cultural evolution,funding,open science,replication,reproducibility}, + file = {/home/michaelb/Zotero/storage/RUTXYEJ7/Smaldino et al. - 2019 - Open science and modified funding lotteries can impede the natural selection of bad science.pdf} +} + +@book{SocietyInternetHow2019, + title = {Society and the {{Internet}}: {{How Networks}} of {{Information}} and {{Communication}} Are {{Changing Our Lives}}}, + shorttitle = {Society and the {{Internet}}}, + year = {2019}, + month = jul, + publisher = {Oxford University Press}, + doi = {10.1093/oso/9780198843498.001.0001}, + urldate = {2024-03-11}, + abstract = {Abstract. How is society being reshaped by the continued diffusion and increasing centrality of the Internet in everyday life and work? Society and the Internet}, + isbn = {978-0-19-187932-6}, + langid = {english}, + file = {/home/michaelb/Zotero/storage/UJ6PRV6G/35088.html} +} + +@unpublished{thagardInternetEpistemologyContributions1997, + title = {Internet {{Epistemology}}: {{Contributions}} of {{New Information Technologies}} to {{Scientific Research}}}, + shorttitle = {Internet {{Epistemology}}}, + author = {Thagard, P.}, + year = {1997}, + file = {/home/michaelb/Zotero/storage/5JFRFHZN/THAIEC.html} +} + @article{thibaultReflectionsPreregistrationCore2024, title = {Reflections on {{Preregistration}}: {{Core Criteria}}, {{Badges}}, {{Complementary Workflows}}}, shorttitle = {Reflections on {{Preregistration}}}, @@ -382,7 +1213,6 @@ publisher = {JOTE Publishers}, issn = {2667-1204,}, doi = {10.36850/mr6}, - url = {https://journal.trialanderror.org/pub/reflections-on-preregistration/release/2}, urldate = {2024-11-06}, abstract = {Clinical trials are routinely preregistered. In psychology and the social sciences, however, only a small percentage of studies are preregistered, and those preregistrations often contain ambiguities. As advocates strive for broader uptake and effective use of preregistration, they can benefit from drawing on the experience of preregistration in clinical trials and adapting some of those successes to the psychology and social sciences context. We recommend that individuals and organizations who promote preregistration: (1) Establish core preregistration criteria required to consider a preregistration complete; (2) Award preregistered badges only to articles that meet the badge criteria; and (3) Leverage complementary workflows that provide a similar function as preregistration.}, langid = {english}, @@ -401,7 +1231,6 @@ pages = {5424--5433}, issn = {1554-3528}, doi = {10.3758/s13428-023-02277-0}, - url = {https://doi.org/10.3758/s13428-023-02277-0}, urldate = {2024-10-15}, abstract = {Preregistration has gained traction as one of the most promising solutions to improve the replicability of scientific effects. In this project, we compared 193 psychology studies that earned a Preregistration Challenge prize or preregistration badge to 193 related studies that were not preregistered. In contrast to our theoretical expectations and prior research, we did not find that preregistered studies had a lower proportion of positive results (Hypothesis 1), smaller effect sizes (Hypothesis 2), or fewer statistical errors (Hypothesis 3) than non-preregistered studies. Supporting our Hypotheses 4 and 5, we found that preregistered studies more often contained power analyses and typically had larger sample sizes than non-preregistered studies. Finally, concerns about the publishability and impact of preregistered studies seem unwarranted, as preregistered studies did not take longer to publish and scored better on several impact measures. Overall, our data indicate that preregistration has beneficial effects in the realm of statistical power and impact, but we did not find robust evidence that preregistration prevents p-hacking and HARKing (Hypothesizing After the Results are Known).}, langid = {english}, @@ -409,22 +1238,121 @@ file = {/home/michaelb/Zotero/storage/8LPRN7WQ/van den Akker et al. - 2024 - Preregistration in practice A comparison of preregistered and non-preregistered studies in psycholo.pdf} } -@incollection{wikstromSituationalActionTheory2019, - title = {Situational {{Action Theory}}: {{A General}}, {{Dynamic}} and {{Mechanism-Based Theory}} of {{Crime}} and {{Its Causes}}}, - shorttitle = {Situational {{Action Theory}}}, - booktitle = {Handbook on {{Crime}} and {{Deviance}}}, - author = {Wikstr{\"o}m, Per-Olof H.}, - editor = {Krohn, Marvin D. and Hendrix, Nicole and Penly Hall, Gina and Lizotte, Alan J.}, - year = {2019}, - pages = {259--281}, - publisher = {Springer International Publishing}, - address = {Cham}, - doi = {10.1007/978-3-030-20779-3_14}, - url = {https://doi.org/10.1007/978-3-030-20779-3_14}, - urldate = {2024-11-16}, - abstract = {The core argument of Situational Action Theory (SAT) is that people ultimately commit acts of crime because they find them viable and acceptable in the circumstance (and there is no relevant and strong enough deterrent) or because they fail to act in accordance with their own personal morals (i.e., fail to exercise self-control) in circumstances when externally pressurised to act otherwise. Situational Action Theory is a general, dynamic and mechanism-based theory of crime and its causes that analyzes crime as moral actions. It proposes to explain all kinds of crime and rule-breaking more broadly (hence general), stresses the importance of the person-environment interaction and its changes (hence dynamic), and focuses on identifying key basic explanatory processes involved in crime causation (hence mechanistic). This chapter gives an overview of the basic assumptions, central concepts and key explanatory propositions of Situational Action Theory.}, - isbn = {978-3-030-20779-3}, +@article{waiteINTERNETKNOWLEDGEEXCHANGE2021, + title = {{{INTERNET KNOWLEDGE EXCHANGE AND CO-AUTHORSHIP AS FACILITATORS IN SCIENTIFIC RESEARCH}}}, + author = {Waite, Vesna}, + year = {2021}, + month = mar, + journal = {Journal of Teaching English for Specific and Academic Purposes}, + number = {0}, + pages = {043--050}, + issn = {2334-9212}, + doi = {10.22190/JTESAP2101043W}, + urldate = {2024-12-13}, + abstract = {The aim of this paper is to determine to what extent the use of Internet as a way of acquiring information for research purposes is a successful tool. The Internet can facilitate the research in different ways, some of which are being presented in the paper. Researchers have access to a wide range of databases available on the Internet, also having the opportunity to use sites designed as a social media for academics such as ResearchGate or Academia. Apart from that, there exists some degree of correspondence between open access philosophy and hacker ethics which is being related to academia to point to the possible ethic value researches have towards one another. The paper focuses on advantages of using Internet for the purposes of facilitating research, at the same time introducing the topic of collaboration and co-authorship as vital in today's `publish-or-perish' academia world.}, + copyright = {Copyright (c) 2021 Journal of Teaching English for Specific and Academic Purposes}, langid = {english}, - keywords = {Causal mechanisms,Causes,Control,Crime,Emergence,Morality,Motivation,Selection,Situation,Situational Action Theory}, - file = {/home/michaelb/Zotero/storage/BULNSUEL/Wikström - 2019 - Situational Action Theory A General, Dynamic and Mechanism-Based Theory of Crime and Its Causes.pdf} + file = {/home/michaelb/Zotero/storage/3K78JFBB/Waite - 2021 - INTERNET KNOWLEDGE EXCHANGE AND CO-AUTHORSHIP AS FACILITATORS IN SCIENTIFIC RESEARCH.pdf} +} + +@article{wardenInternetScienceCommunication2010, + title = {The {{Internet}} and Science Communication: Blurring the Boundaries}, + shorttitle = {The {{Internet}} and Science Communication}, + author = {Warden, R}, + year = {2010}, + month = dec, + journal = {ecancermedicalscience}, + volume = {4}, + pages = {203}, + issn = {1754-6605}, + doi = {10.3332/ecancer.2010.203}, + urldate = {2024-12-13}, + abstract = {Scientific research is heavily dependent on communication and collaboration. Research does not exist in a bubble; scientific work must be communicated in order to add it to the body of knowledge within a scientific community, so that its members may `stand on the shoulders of giants' and benefit from all that has come before. The effectiveness of scientific communication is crucial to the pace of scientific progress: in all its forms it enables ideas to be formulated, results to be compared, and replications and improvements to be made. The sharing of science is a foundational aspect of the scientific method. This paper, part of the policy research within the FP7 EUROCANCERCOMS project, discusses how the Internet has changed communication by cancer researchers and how it has the potential to change it still more in the future. It will detail two broad types of communication: formal and informal, and how these are changing with the use of new web tools and technologies.}, + pmcid = {PMC3234032}, + pmid = {22276045}, + file = {/home/michaelb/Zotero/storage/6E5I3X22/Warden - 2010 - The Internet and science communication blurring the boundaries.pdf} +} + +@article{wilkinsonTestingNullHypothesis2013, + title = {Testing the Null Hypothesis: {{The}} Forgotten Legacy of {{Karl Popper}}?}, + shorttitle = {Testing the Null Hypothesis}, + author = {Wilkinson, Mick}, + year = {2013}, + month = may, + journal = {Journal of Sports Sciences}, + volume = {31}, + number = {9}, + pages = {919--920}, + publisher = {Routledge}, + issn = {0264-0414}, + doi = {10.1080/02640414.2012.753636}, + urldate = {2024-12-13}, + abstract = {Testing of the null hypothesis is a fundamental aspect of the scientific method and has its basis in the falsification theory of Karl Popper. Null hypothesis testing makes use of deductive reasoning to ensure that the truth of conclusions is irrefutable. In contrast, attempting to demonstrate the new facts on the basis of testing the experimental or research hypothesis makes use of inductive reasoning and is prone to the problem of the Uniformity of Nature assumption described by David Hume in the eighteenth century. Despite this issue and the well documented solution provided by Popper's falsification theory, the majority of publications are still written such that they suggest the research hypothesis is being tested. This is contrary to accepted scientific convention and possibly highlights a poor understanding of the application of conventional significance-based data analysis approaches. Our work should remain driven by conjecture and attempted falsification such that it is always the null hypothesis that is tested. The write up of our studies should make it clear that we are indeed testing the null hypothesis and conforming to the established and accepted philosophical conventions of the scientific method.}, + pmid = {23249368}, + keywords = {philosophy,science,statistics}, + file = {/home/michaelb/Zotero/storage/BYBEMADP/Wilkinson - 2013 - Testing the null hypothesis The forgotten legacy of Karl Popper.pdf} +} + +@article{willinskyUnacknowledgedConvergenceOpen2005, + title = {The Unacknowledged Convergence of Open Source, Open Access, and Open Science}, + author = {Willinsky, John}, + year = {2005}, + month = aug, + journal = {First Monday}, + issn = {1396-0466}, + doi = {10.5210/fm.v10i8.1265}, + urldate = {2024-12-11}, + abstract = {A number of open initiatives are actively resisting the extension of intellectual property rights. Among these developments, three prominent instances --- open source software, open access to research and scholarship, and open science --- share not only a commitment to the unrestricted exchange of information and ideas, but economic principles based on (1) the efficacy of free software and research; (2) the reputation--building afforded by public access and patronage; and, (3) the emergence of a free--or--subscribe access model. Still, with this much in common, the strong sense of convergence among these open initiatives has yet to be fully realized, to the detriment of the larger, common issue. By drawing on David's (2004; 2003; 2000; 1998) economic work on open science and Weber's (2004) analysis of open source, this paper seeks to make that convergence all the more apparent, as well as worth pursuing, by those interested in furthering this alternative approach, which would treat intellectual properties as public goods.}, + copyright = {Copyright (c)}, + langid = {english}, + file = {/home/michaelb/Zotero/storage/3E4G42JK/Unacknowledged convergence of open source, open access, and open science.pdf} +} + +@article{xuImpactInternetAccess2021, + title = {The Impact of Internet Access on Research Output - a Cross-Country Study}, + author = {Xu, Xu and Reed, Markum}, + year = {2021}, + month = sep, + journal = {Information Economics and Policy}, + volume = {56}, + pages = {100914}, + issn = {0167-6245}, + doi = {10.1016/j.infoecopol.2021.100914}, + urldate = {2024-12-13}, + abstract = {There are large variations in research output among nations despite the rapid globalization progress. This article provides a new angle to help explain such variations. In this article, we study the impact of internet penetration on the research output of an economy. Using a country-level panel dataset, we find that higher internet penetration increases the volume of research output in an economy. The results are robust to a number of specifications, including an instrumental variable approach that addresses the endogeneity of internet penetration. We also find some evidence showing that the impact of internet penetration on research output quantity decreases as the size of fixed broadband users increase in an economy. The effects of internet access on research quality is less conclusive. Results suggest that broadening the access of internet is important for research output boosting or innovation in general.}, + keywords = {Academic productivity,Internet access,Internet penetration,Publication,Research output,Research quality}, + file = {/home/michaelb/Zotero/storage/IJNNM2CJ/S0167624521000020.html} +} + +@inproceedings{zengCBCClusteringBased2003, + title = {{{CBC}}: Clustering Based Text Classification Requiring Minimal Labeled Data}, + shorttitle = {{{CBC}}}, + booktitle = {Third {{IEEE International Conference}} on {{Data Mining}}}, + author = {Zeng, Hua-Jun and Wang, Xuan-Hui and Chen, Zheng and Lu, Hongjun and Ma, Wei-Ying}, + year = {2003}, + month = nov, + pages = {443--450}, + doi = {10.1109/ICDM.2003.1250951}, + urldate = {2024-12-16}, + abstract = {Semisupervised learning methods construct classifiers using both labeled and unlabeled training data samples. While unlabeled data samples can help to improve the accuracy of trained models to certain extent, existing methods still face difficulties when labeled data is not sufficient and biased against the underlying data distribution. We present a clustering based classification (CBC) approach. Using this approach, training data, including both the labeled and unlabeled data, is first clustered with the guidance of the labeled data. Some of unlabeled data samples are then labeled based on the clusters obtained. Discriminative classifiers can subsequently be trained with the expanded labeled dataset. The effectiveness of the proposed method is justified analytically. Our experimental results demonstrated that CBC outperforms existing algorithms when the size of labeled dataset is very small.}, + keywords = {Asia,Classification algorithms,Clustering algorithms,Computer science,Semisupervised learning,Supervised learning,Support vector machine classification,Support vector machines,Text categorization,Training data}, + file = {/home/michaelb/Zotero/storage/8FSJCPWF/1250951.html} +} + +@article{zenk-moltgenFactorsInfluencingData2018, + title = {Factors Influencing the Data Sharing Behavior of Researchers in Sociology and Political Science}, + author = {{Zenk-M{\"o}ltgen}, Wolfgang and Akdeniz, Esra and Katsanidou, Alexia and Na{\ss}hoven, Verena and Balaban, Ebru}, + year = {2018}, + month = jun, + journal = {Journal of Documentation}, + volume = {74}, + number = {5}, + pages = {1053--1073}, + publisher = {Emerald Publishing Limited}, + issn = {0022-0418}, + doi = {10.1108/JD-09-2017-0126}, + urldate = {2024-12-15}, + abstract = {Open data and data sharing should improve transparency of research. The purpose of this paper is to investigate how different institutional and individual factors affect the data sharing behavior of authors of research articles in sociology and political science.,Desktop research analyzed attributes of sociology and political science journals (n=262) from their websites. A second data set of articles (n=1,011; published 2012-2014) was derived from ten of the main journals (five from each discipline) and stated data sharing was examined. A survey of the authors used the Theory of Planned Behavior to examine motivations, behavioral control, and perceived norms for sharing data. Statistical tests (Spearman's {$\rho$}, {$\chi$}2) examined correlations and associations.,Although many journals have a data policy for their authors (78 percent in sociology, 44 percent in political science), only around half of the empirical articles stated that the data were available, and for only 37 percent of the articles could the data be accessed. Journals with higher impact factors, those with a stated data policy, and younger journals were more likely to offer data availability. Of the authors surveyed, 446 responded (44 percent). Statistical analysis indicated that authors' attitudes, reported past behavior, social norms, and perceived behavioral control affected their intentions to share data.,Less than 50 percent of the authors contacted provided responses to the survey. Results indicate that data sharing would improve if journals had explicit data sharing policies but authors also need support from other institutions (their universities, funding councils, and professional associations) to improve data management skills and infrastructures.,This paper builds on previous similar research in sociology and political science and explains some of the barriers to data sharing in social sciences by combining journal policies, published articles, and authors' responses to a survey.}, + langid = {english}, + file = {/home/michaelb/Zotero/storage/2VF37P6B/Zenk-Möltgen et al. - 2018 - Factors influencing the data sharing behavior of researchers in sociology and political science.pdf;/home/michaelb/Zotero/storage/S9XMU592/html.html} } diff --git a/make.sh b/make.sh index 5ef128b..88ade72 100755 --- a/make.sh +++ b/make.sh @@ -11,7 +11,6 @@ pandoc -i "$IN" \ -o "$OUT" \ --csl=apa-7th-edition.csl \ --citeproc \ - --filter pandoc-crossref \ --lua-filter=filters/first-line-indent.lua \ --citation-abbreviations=citation-abbreviations.csl diff --git a/modify-pdf.sh b/modify-pdf.sh old mode 100644 new mode 100755