90 lines
2.3 KiB
Markdown
90 lines
2.3 KiB
Markdown
# SciPaperLoader Diagnostic Guide
|
|
|
|
This guide explains how to use the diagnostic tools included with SciPaperLoader,
|
|
especially for addressing issues with the scraper module.
|
|
|
|
## Common Issues and Solutions
|
|
|
|
### 1. Scraper Still Runs After Being Stopped
|
|
|
|
**Symptoms:**
|
|
- Web interface shows scraper as stopped but papers are still being processed
|
|
- `/scraper/stop` endpoint returns success but processing continues
|
|
- Active tasks show up in Celery inspector
|
|
|
|
**Solutions:**
|
|
|
|
```bash
|
|
# Run the emergency stop to force-terminate all tasks
|
|
make diagnostics # Then select option 5 (Emergency stop)
|
|
|
|
# Or directly:
|
|
python tools/diagnostics/emergency_stop.py
|
|
```
|
|
|
|
The emergency stop performs these actions:
|
|
- Sets scraper state to inactive in the database
|
|
- Revokes all running, reserved, and scheduled Celery tasks
|
|
- Purges all task queues
|
|
- Reverts papers with "Pending" status to their previous state
|
|
|
|
### 2. Workers Not Picking Up Code Changes
|
|
|
|
**Symptoms:**
|
|
- Code changes don't seem to have any effect
|
|
- Bug fixes don't work even though the code is updated
|
|
- Workers might be using cached versions of modified code
|
|
|
|
**Solution:**
|
|
|
|
```bash
|
|
# Use the quick fix to stop tasks and restart workers
|
|
make diagnostics # Then select option 6 (Quick fix)
|
|
|
|
# Or directly:
|
|
python tools/diagnostics/quick_fix.py
|
|
```
|
|
|
|
### 3. Investigating Task or Scraper Issues
|
|
|
|
```bash
|
|
# Run the full diagnostic tool
|
|
make diagnostics # Then select option 3 (Full diagnostic report)
|
|
|
|
# Or directly:
|
|
python tools/diagnostics/diagnose_scraper.py
|
|
```
|
|
|
|
This tool will:
|
|
- Show current scraper state
|
|
- List all active, scheduled, and reserved tasks
|
|
- Display recent activity and error logs
|
|
|
|
## Preventative Measures
|
|
|
|
1. **Always stop the scraper properly** through the web interface before:
|
|
- Restarting the application
|
|
- Deploying code changes
|
|
- Modifying the database
|
|
|
|
2. **Monitor task queue size** using Flower web interface:
|
|
```bash
|
|
make celery-flower
|
|
```
|
|
Then visit http://localhost:5555
|
|
|
|
3. **Check logs for failed tasks** regularly in the Logger tab of the application
|
|
|
|
## For Developers
|
|
|
|
To test the paper reversion functionality:
|
|
|
|
```bash
|
|
make diagnostics # Then select option 4 (Test paper reversion)
|
|
|
|
# Or directly:
|
|
python tools/diagnostics/test_reversion.py
|
|
```
|
|
|
|
This is particularly helpful after making changes to the scraper or task handling code.
|