ThesisProposal/ResearchPlan.md

4.3 KiB
Raw Permalink Blame History

Data, Method, and Analysis of Open Science Practices in Sociology and Criminology Papers

Population

  • Papers in sociology and criminology utilizing data and statistical methods.
  • Focus on evaluating open science practices:
    • Pre-registration
    • Open data
    • Open materials
    • Open access
    • Statistical inference

Data Collection

  1. Journal Identification

    • Use Clarivate Journal Citation Report API to obtain a comprehensive list of sociology and criminology journals.
    • Filter the list to include journals accessible through university licensing agreements.
  2. Metadata Download

    • Utilize APIs such as CrUsing Crossref, Scopus or WOS to download metadata for all papers published between 20132023.
  3. Full-Text Retrieval

  4. Preprocessing

    • Convert collected papers to plain text using:
      • SciPDF Parser for PDF-to-text conversion.
      • HTML-to-text tools like html2text.
    • Standardize text format for subsequent analysis.
  5. Resource Management

    • Address potential constraints:
      • Use scalable data collection methods.
      • Leverage institutional resources (e.g., libraries and repositories).
      • Implement efficient workflows for text extraction and preprocessing (multicore processing).

Classification

  1. Operationalization

    • Define clear criteria for identifying open science practices:
      • Pre-registration: Terms like "pre-registered."
      • Open data: Phrases like "data availability statement."
      • Open materials: Statements like "materials available on request."
  2. Keyword Dictionary Creation

    • Develop dictionaries of terms and phrases associated with each open science practice.
    • Base dictionaries on prior research (e.g., @scogginsMeasuringTransparencySocial2024a).
    • Compare and join dictionaries.
  3. Manual Annotation

    • Manually classify a subset of 1,0002,000 papers for training machine learning models.
    • Use stratified sampling to ensure diversity in:
      • Journals
      • Publication years
      • Subfields within sociology and criminology.
  4. Feature Extraction

    • Create document-feature matrices (DFMs) using keyword dictionaries to prepare data for machine learning.
  5. Model Training

    • Train multiple machine learning models:
      • Naive Bayes
      • Logistic Regression
      • Support Vector Machines
      • Random Forests
      • Gradient Boosted Trees
    • Evaluate model performance to select the best classifier for each open science practice.
  6. Automated Classification

    • Apply the best-performing models to classify the entire dataset.
    • Automate the identification of open science practices across all collected papers.

Analysis

  1. Descriptive Analysis

    • Examine temporal trends in the adoption of open science practices over the past decade.
      • Compare practices across sociology and criminology.
      • Compare journals
  2. Evaluation of Results

    • Identify patterns in:
      • Prevalence of pre-registration, open data, open materials, and open access.
      • Statistical inference methods.
  3. Ethical Considerations

    • Ensure all methodologies comply with ethical and legal guidelines.
    • Avoid unauthorized sources such as Sci-Hub or LibGen.
  4. Broader Implications

    • Contribute to understanding the adoption of transparency and reproducibility in social sciences.
    • Inform efforts to promote open science practices in sociology, criminology, and beyond.