SciPaperLoader/tools/DIAGNOSTIC_GUIDE.md
2025-05-24 12:39:23 +02:00

2.3 KiB

SciPaperLoader Diagnostic Guide

This guide explains how to use the diagnostic tools included with SciPaperLoader, especially for addressing issues with the scraper module.

Common Issues and Solutions

1. Scraper Still Runs After Being Stopped

Symptoms:

  • Web interface shows scraper as stopped but papers are still being processed
  • /scraper/stop endpoint returns success but processing continues
  • Active tasks show up in Celery inspector

Solutions:

# Run the emergency stop to force-terminate all tasks
make diagnostics   # Then select option 5 (Emergency stop)

# Or directly:
python tools/diagnostics/emergency_stop.py

The emergency stop performs these actions:

  • Sets scraper state to inactive in the database
  • Revokes all running, reserved, and scheduled Celery tasks
  • Purges all task queues
  • Reverts papers with "Pending" status to their previous state

2. Workers Not Picking Up Code Changes

Symptoms:

  • Code changes don't seem to have any effect
  • Bug fixes don't work even though the code is updated
  • Workers might be using cached versions of modified code

Solution:

# Use the quick fix to stop tasks and restart workers
make diagnostics   # Then select option 6 (Quick fix)

# Or directly:
python tools/diagnostics/quick_fix.py

3. Investigating Task or Scraper Issues

# Run the full diagnostic tool
make diagnostics   # Then select option 3 (Full diagnostic report)

# Or directly:
python tools/diagnostics/diagnose_scraper.py

This tool will:

  • Show current scraper state
  • List all active, scheduled, and reserved tasks
  • Display recent activity and error logs

Preventative Measures

  1. Always stop the scraper properly through the web interface before:

    • Restarting the application
    • Deploying code changes
    • Modifying the database
  2. Monitor task queue size using Flower web interface:

    make celery-flower
    

    Then visit http://localhost:5555

  3. Check logs for failed tasks regularly in the Logger tab of the application

For Developers

To test the paper reversion functionality:

make diagnostics   # Then select option 4 (Test paper reversion)

# Or directly:
python tools/diagnostics/test_reversion.py

This is particularly helpful after making changes to the scraper or task handling code.