adds debugging tools
This commit is contained in:
parent
3a21c4429b
commit
ac348696b5
97
DEVELOPMENT
97
DEVELOPMENT
@ -1,97 +0,0 @@
|
|||||||
## Directory Structure
|
|
||||||
|
|
||||||
Below is the directory and file layout for the `scipaperloader` project:
|
|
||||||
|
|
||||||
```plaintext
|
|
||||||
scipaperloader/
|
|
||||||
├── app/
|
|
||||||
│ ├── __init__.py # Initialize Flask app and database
|
|
||||||
│ ├── models.py # SQLAlchemy database models
|
|
||||||
│ ├── main.py # Flask routes (main blueprint)
|
|
||||||
│ ├── templates/ # Jinja2 templates for HTML pages
|
|
||||||
│ │ ├── base.html # Base layout template with Alpine.js and HTMX
|
|
||||||
│ │ ├── index.html # Home page template
|
|
||||||
│ │ ├── upload.html # CSV upload page template
|
|
||||||
│ │ ├── schedule.html # Schedule configuration page template
|
|
||||||
│ │ └── logs.html # Logs display page template
|
|
||||||
│ └── static/ # Static files (CSS, JS, images)
|
|
||||||
├── scraper.py # Background scraper daemon script
|
|
||||||
├── tests/
|
|
||||||
│ └── test_scipaperloader.py # Tests with a Flask test fixture
|
|
||||||
├── config.py # Configuration settings for different environments
|
|
||||||
├── pyproject.toml # Project metadata and build configuration
|
|
||||||
├── setup.cfg # Development tool configurations (linting, testing)
|
|
||||||
├── Makefile # Convenient commands for development tasks
|
|
||||||
└── .venv/ # Python virtual environment (not in version control)
|
|
||||||
```
|
|
||||||
|
|
||||||
- The **`app/`** package contains the Flask application code. It includes an `__init__.py` to create the app and set up extensions, a `models.py` defining database models with SQLAlchemy, and a `main.py` defining routes in a Flask Blueprint. The `templates/` directory holds HTML templates (with Jinja2 syntax) and `static/` will contain static assets (e.g., custom CSS or JS files, if any).
|
|
||||||
- The **`scraper.py`** is a **standalone** Python script acting as a background daemon. It can be run separately to perform background scraping tasks (e.g., periodically fetching new data). This script will use the same database (via SQLAlchemy models or direct database access) to read or write data as needed.
|
|
||||||
- The **`tests/`** directory includes a test file that uses pytest to ensure the Flask app and its components work as expected. A Flask fixture creates an application instance for testing (with an in-memory database) and verifies routes and database operations (e.g., uploading CSV adds records).
|
|
||||||
- The **configuration and setup files** at the project root help in development and deployment. `config.py` defines configuration classes (for development, testing, production) so the app can be easily configured. `pyproject.toml` and `setup.cfg` provide project metadata and tool configurations (for packaging, linting, etc.), and a `Makefile` is included to simplify common tasks (running the app, tests, etc.).
|
|
||||||
|
|
||||||
## How to use the logger
|
|
||||||
|
|
||||||
### GUI Interactions:
|
|
||||||
|
|
||||||
```python
|
|
||||||
ActivityLog.log_gui_interaction(
|
|
||||||
action="view_paper_details",
|
|
||||||
description="User viewed paper details",
|
|
||||||
paper_id=123
|
|
||||||
)
|
|
||||||
```
|
|
||||||
|
|
||||||
### Configuration Changes:
|
|
||||||
|
|
||||||
```python
|
|
||||||
ActivityLog.log_gui_interaction(
|
|
||||||
action="view_paper_details",
|
|
||||||
description="User viewed paper details",
|
|
||||||
paper_id=123
|
|
||||||
)
|
|
||||||
```
|
|
||||||
|
|
||||||
### Scraper Commands:
|
|
||||||
|
|
||||||
```python
|
|
||||||
ActivityLog.log_scraper_command(
|
|
||||||
action="start_scraper",
|
|
||||||
status="running"
|
|
||||||
)
|
|
||||||
```
|
|
||||||
|
|
||||||
### Scraper Activities:
|
|
||||||
|
|
||||||
```python
|
|
||||||
ActivityLog.log_scraper_activity(
|
|
||||||
action="download_paper",
|
|
||||||
paper_id=123,
|
|
||||||
status="success",
|
|
||||||
description="Paper downloaded successfully",
|
|
||||||
file_path="/papers/123.pdf"
|
|
||||||
)
|
|
||||||
```
|
|
||||||
|
|
||||||
### Error Logging:
|
|
||||||
|
|
||||||
```python
|
|
||||||
# Simple error
|
|
||||||
ActivityLog.log_error(
|
|
||||||
error_message="Failed to connect to API",
|
|
||||||
severity=ErrorSeverity.WARNING.value,
|
|
||||||
source="api_client"
|
|
||||||
)
|
|
||||||
|
|
||||||
# Logging an exception
|
|
||||||
try:
|
|
||||||
result = some_risky_operation()
|
|
||||||
except Exception as e:
|
|
||||||
ActivityLog.log_error(
|
|
||||||
error_message="Operation failed",
|
|
||||||
exception=e,
|
|
||||||
severity=ErrorSeverity.ERROR.value,
|
|
||||||
source="data_processor",
|
|
||||||
paper_id=paper_id
|
|
||||||
)
|
|
||||||
```
|
|
@ -28,4 +28,70 @@ scipaperloader/
|
|||||||
- The **`app/`** package contains the Flask application code. It includes an `__init__.py` to create the app and set up extensions, a `models.py` defining database models with SQLAlchemy, and a `main.py` defining routes in a Flask Blueprint. The `templates/` directory holds HTML templates (with Jinja2 syntax) and `static/` will contain static assets (e.g., custom CSS or JS files, if any).
|
- The **`app/`** package contains the Flask application code. It includes an `__init__.py` to create the app and set up extensions, a `models.py` defining database models with SQLAlchemy, and a `main.py` defining routes in a Flask Blueprint. The `templates/` directory holds HTML templates (with Jinja2 syntax) and `static/` will contain static assets (e.g., custom CSS or JS files, if any).
|
||||||
- The **`scraper.py`** is a **standalone** Python script acting as a background daemon. It can be run separately to perform background scraping tasks (e.g., periodically fetching new data). This script will use the same database (via SQLAlchemy models or direct database access) to read or write data as needed.
|
- The **`scraper.py`** is a **standalone** Python script acting as a background daemon. It can be run separately to perform background scraping tasks (e.g., periodically fetching new data). This script will use the same database (via SQLAlchemy models or direct database access) to read or write data as needed.
|
||||||
- The **`tests/`** directory includes a test file that uses pytest to ensure the Flask app and its components work as expected. A Flask fixture creates an application instance for testing (with an in-memory database) and verifies routes and database operations (e.g., uploading CSV adds records).
|
- The **`tests/`** directory includes a test file that uses pytest to ensure the Flask app and its components work as expected. A Flask fixture creates an application instance for testing (with an in-memory database) and verifies routes and database operations (e.g., uploading CSV adds records).
|
||||||
- The **configuration and setup files** at the project root help in development and deployment. `config.py` defines configuration classes (for development, testing, production) so the app can be easily configured. `pyproject.toml` and `setup.cfg` provide project metadata and tool configurations (for packaging, linting, etc.), and a `Makefile` is included to simplify common tasks (running the app, tests, etc.).
|
- The **configuration and setup files** at the project root help in development and deployment. `config.py` defines configuration classes (for development, testing, production) so the app can be easily configured. `pyproject.toml` and `setup.cfg` provide project metadata and tool configurations (for packaging, linting, etc.), and a `Makefile` is included to simplify common tasks (running the app, tests, etc.).
|
||||||
|
|
||||||
|
## How to use the logger
|
||||||
|
|
||||||
|
### GUI Interactions:
|
||||||
|
|
||||||
|
```python
|
||||||
|
ActivityLog.log_gui_interaction(
|
||||||
|
action="view_paper_details",
|
||||||
|
description="User viewed paper details",
|
||||||
|
paper_id=123
|
||||||
|
)
|
||||||
|
```
|
||||||
|
|
||||||
|
### Configuration Changes:
|
||||||
|
|
||||||
|
```python
|
||||||
|
ActivityLog.log_gui_interaction(
|
||||||
|
action="view_paper_details",
|
||||||
|
description="User viewed paper details",
|
||||||
|
paper_id=123
|
||||||
|
)
|
||||||
|
```
|
||||||
|
|
||||||
|
### Scraper Commands:
|
||||||
|
|
||||||
|
```python
|
||||||
|
ActivityLog.log_scraper_command(
|
||||||
|
action="start_scraper",
|
||||||
|
status="running"
|
||||||
|
)
|
||||||
|
```
|
||||||
|
|
||||||
|
### Scraper Activities:
|
||||||
|
|
||||||
|
```python
|
||||||
|
ActivityLog.log_scraper_activity(
|
||||||
|
action="download_paper",
|
||||||
|
paper_id=123,
|
||||||
|
status="success",
|
||||||
|
description="Paper downloaded successfully",
|
||||||
|
file_path="/papers/123.pdf"
|
||||||
|
)
|
||||||
|
```
|
||||||
|
|
||||||
|
### Error Logging:
|
||||||
|
|
||||||
|
```python
|
||||||
|
# Simple error
|
||||||
|
ActivityLog.log_error(
|
||||||
|
error_message="Failed to connect to API",
|
||||||
|
severity=ErrorSeverity.WARNING.value,
|
||||||
|
source="api_client"
|
||||||
|
)
|
||||||
|
|
||||||
|
# Logging an exception
|
||||||
|
try:
|
||||||
|
result = some_risky_operation()
|
||||||
|
except Exception as e:
|
||||||
|
ActivityLog.log_error(
|
||||||
|
error_message="Operation failed",
|
||||||
|
exception=e,
|
||||||
|
severity=ErrorSeverity.ERROR.value,
|
||||||
|
source="data_processor",
|
||||||
|
paper_id=paper_id
|
||||||
|
)
|
||||||
|
```
|
7
Makefile
7
Makefile
@ -1,5 +1,5 @@
|
|||||||
# List of phony targets (targets that don't represent files)
|
# List of phony targets (targets that don't represent files)
|
||||||
.PHONY: all clean venv run format format-check lint mypy test dist reformat dev celery celery-flower redis run-all
|
.PHONY: all clean venv run format format-check lint mypy test dist reformat dev celery celery-flower redis run-all diagnostics
|
||||||
|
|
||||||
# Define Python and pip executables inside virtual environment
|
# Define Python and pip executables inside virtual environment
|
||||||
PYTHON := venv/bin/python
|
PYTHON := venv/bin/python
|
||||||
@ -190,5 +190,10 @@ stop-flask:
|
|||||||
stop-all: stop-celery stop-flask
|
stop-all: stop-celery stop-flask
|
||||||
@echo "All components stopped."
|
@echo "All components stopped."
|
||||||
|
|
||||||
|
# Run diagnostic tools
|
||||||
|
# Run diagnostic tools - works with or without virtualenv
|
||||||
|
diagnostics:
|
||||||
|
$(PYTHON) tools/run_diagnostics.py
|
||||||
|
|
||||||
# Default target
|
# Default target
|
||||||
all: run
|
all: run
|
||||||
|
57
README.md
57
README.md
@ -121,4 +121,59 @@ When deploying to production:
|
|||||||
|
|
||||||
1. Configure a production-ready Redis instance or use a managed service
|
1. Configure a production-ready Redis instance or use a managed service
|
||||||
2. Run Celery workers as system services or in Docker containers
|
2. Run Celery workers as system services or in Docker containers
|
||||||
3. Consider setting up monitoring for your Celery tasks and workers
|
3. Consider setting up monitoring for your Celery tasks and workers
|
||||||
|
|
||||||
|
## Troubleshooting and Diagnostics
|
||||||
|
|
||||||
|
SciPaperLoader includes a collection of diagnostic and emergency tools to help address issues with the application, particularly with the scraper and Celery task system.
|
||||||
|
|
||||||
|
### Quick Access
|
||||||
|
|
||||||
|
For easy access to all diagnostic tools through an interactive menu:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Using Make:
|
||||||
|
make diagnostics
|
||||||
|
|
||||||
|
# Using the shell scripts (works with any shell):
|
||||||
|
./tools/run-diagnostics.sh
|
||||||
|
|
||||||
|
# Fish shell version:
|
||||||
|
./tools/run-diagnostics.fish
|
||||||
|
|
||||||
|
# Or directly with Python:
|
||||||
|
python tools/diagnostics/diagnostic_menu.py
|
||||||
|
```
|
||||||
|
|
||||||
|
### Diagnostic Tools
|
||||||
|
|
||||||
|
All diagnostic tools are located in the `tools/diagnostics/` directory:
|
||||||
|
|
||||||
|
- **check_state.py**: Quickly check the current state of the scraper in the database
|
||||||
|
- **diagnose_scraper.py**: Comprehensive diagnostic tool that examines tasks, logs, and scraper state
|
||||||
|
- **inspect_tasks.py**: View currently running, scheduled, and reserved Celery tasks
|
||||||
|
- **test_reversion.py**: Test the paper reversion functionality when stopping the scraper
|
||||||
|
|
||||||
|
### Emergency Recovery
|
||||||
|
|
||||||
|
For cases where the scraper is stuck or behaving unexpectedly:
|
||||||
|
|
||||||
|
- **emergency_stop.py**: Force stops all scraper activities, revokes all running tasks, and reverts papers from "Pending" state
|
||||||
|
- **quick_fix.py**: Simplified emergency stop that also restarts Celery workers to ensure code changes are applied
|
||||||
|
|
||||||
|
### Usage Example
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Check the current state of the scraper
|
||||||
|
python tools/diagnostics/check_state.py
|
||||||
|
|
||||||
|
# Diagnose issues with tasks and logs
|
||||||
|
python tools/diagnostics/diagnose_scraper.py
|
||||||
|
|
||||||
|
# Emergency stop when scraper is stuck
|
||||||
|
python tools/diagnostics/emergency_stop.py
|
||||||
|
```
|
||||||
|
|
||||||
|
For more information, see:
|
||||||
|
- The README in the `tools/diagnostics/` directory
|
||||||
|
- The comprehensive `tools/DIAGNOSTIC_GUIDE.md` for troubleshooting specific issues
|
89
tools/DIAGNOSTIC_GUIDE.md
Normal file
89
tools/DIAGNOSTIC_GUIDE.md
Normal file
@ -0,0 +1,89 @@
|
|||||||
|
# SciPaperLoader Diagnostic Guide
|
||||||
|
|
||||||
|
This guide explains how to use the diagnostic tools included with SciPaperLoader,
|
||||||
|
especially for addressing issues with the scraper module.
|
||||||
|
|
||||||
|
## Common Issues and Solutions
|
||||||
|
|
||||||
|
### 1. Scraper Still Runs After Being Stopped
|
||||||
|
|
||||||
|
**Symptoms:**
|
||||||
|
- Web interface shows scraper as stopped but papers are still being processed
|
||||||
|
- `/scraper/stop` endpoint returns success but processing continues
|
||||||
|
- Active tasks show up in Celery inspector
|
||||||
|
|
||||||
|
**Solutions:**
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Run the emergency stop to force-terminate all tasks
|
||||||
|
make diagnostics # Then select option 5 (Emergency stop)
|
||||||
|
|
||||||
|
# Or directly:
|
||||||
|
python tools/diagnostics/emergency_stop.py
|
||||||
|
```
|
||||||
|
|
||||||
|
The emergency stop performs these actions:
|
||||||
|
- Sets scraper state to inactive in the database
|
||||||
|
- Revokes all running, reserved, and scheduled Celery tasks
|
||||||
|
- Purges all task queues
|
||||||
|
- Reverts papers with "Pending" status to their previous state
|
||||||
|
|
||||||
|
### 2. Workers Not Picking Up Code Changes
|
||||||
|
|
||||||
|
**Symptoms:**
|
||||||
|
- Code changes don't seem to have any effect
|
||||||
|
- Bug fixes don't work even though the code is updated
|
||||||
|
- Workers might be using cached versions of modified code
|
||||||
|
|
||||||
|
**Solution:**
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Use the quick fix to stop tasks and restart workers
|
||||||
|
make diagnostics # Then select option 6 (Quick fix)
|
||||||
|
|
||||||
|
# Or directly:
|
||||||
|
python tools/diagnostics/quick_fix.py
|
||||||
|
```
|
||||||
|
|
||||||
|
### 3. Investigating Task or Scraper Issues
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Run the full diagnostic tool
|
||||||
|
make diagnostics # Then select option 3 (Full diagnostic report)
|
||||||
|
|
||||||
|
# Or directly:
|
||||||
|
python tools/diagnostics/diagnose_scraper.py
|
||||||
|
```
|
||||||
|
|
||||||
|
This tool will:
|
||||||
|
- Show current scraper state
|
||||||
|
- List all active, scheduled, and reserved tasks
|
||||||
|
- Display recent activity and error logs
|
||||||
|
|
||||||
|
## Preventative Measures
|
||||||
|
|
||||||
|
1. **Always stop the scraper properly** through the web interface before:
|
||||||
|
- Restarting the application
|
||||||
|
- Deploying code changes
|
||||||
|
- Modifying the database
|
||||||
|
|
||||||
|
2. **Monitor task queue size** using Flower web interface:
|
||||||
|
```bash
|
||||||
|
make celery-flower
|
||||||
|
```
|
||||||
|
Then visit http://localhost:5555
|
||||||
|
|
||||||
|
3. **Check logs for failed tasks** regularly in the Logger tab of the application
|
||||||
|
|
||||||
|
## For Developers
|
||||||
|
|
||||||
|
To test the paper reversion functionality:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
make diagnostics # Then select option 4 (Test paper reversion)
|
||||||
|
|
||||||
|
# Or directly:
|
||||||
|
python tools/diagnostics/test_reversion.py
|
||||||
|
```
|
||||||
|
|
||||||
|
This is particularly helpful after making changes to the scraper or task handling code.
|
63
tools/diagnostics/README.md
Normal file
63
tools/diagnostics/README.md
Normal file
@ -0,0 +1,63 @@
|
|||||||
|
# SciPaperLoader Diagnostic and Emergency Tools
|
||||||
|
|
||||||
|
This directory contains various scripts for diagnosing issues, debugging, and handling emergency situations with the SciPaperLoader application.
|
||||||
|
|
||||||
|
## Available Tools
|
||||||
|
|
||||||
|
### Scraper Management
|
||||||
|
|
||||||
|
- **emergency_stop.py**: Force stops all scraper activities, revokes running tasks, and reverts papers from "Pending" state
|
||||||
|
- **quick_fix.py**: A simplified emergency stop that also restarts Celery workers to ensure code changes are applied
|
||||||
|
- **test_reversion.py**: Tests the paper reversion functionality when stopping the scraper
|
||||||
|
|
||||||
|
### Monitoring and Diagnostics
|
||||||
|
|
||||||
|
- **check_state.py**: Checks the current state of the scraper in the database
|
||||||
|
- **diagnose_scraper.py**: Comprehensive diagnostic tool that examines tasks, logs, and scraper state
|
||||||
|
- **inspect_tasks.py**: Displays currently running, scheduled, and reserved Celery tasks
|
||||||
|
|
||||||
|
## Usage
|
||||||
|
|
||||||
|
### Interactive Diagnostic Menu
|
||||||
|
|
||||||
|
For a user-friendly interface to all diagnostic tools, run:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
cd /path/to/SciPaperLoader
|
||||||
|
python tools/run_diagnostics.py
|
||||||
|
```
|
||||||
|
|
||||||
|
This will launch the diagnostic menu where you can select which tool to run.
|
||||||
|
|
||||||
|
### Emergency Stop
|
||||||
|
|
||||||
|
When the scraper needs to be immediately stopped, regardless of its state:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
cd /path/to/SciPaperLoader
|
||||||
|
python tools/diagnostics/emergency_stop.py
|
||||||
|
```
|
||||||
|
|
||||||
|
### Diagnosing Issues
|
||||||
|
|
||||||
|
To investigate problems with the scraper or task queue:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
cd /path/to/SciPaperLoader
|
||||||
|
python tools/diagnostics/diagnose_scraper.py
|
||||||
|
```
|
||||||
|
|
||||||
|
### Quick Fix for Worker Issues
|
||||||
|
|
||||||
|
If tasks continue to run despite stopping the scraper:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
cd /path/to/SciPaperLoader
|
||||||
|
python tools/diagnostics/quick_fix.py
|
||||||
|
```
|
||||||
|
|
||||||
|
## Notes
|
||||||
|
|
||||||
|
- Always run these scripts from the project root directory
|
||||||
|
- Some scripts may require a running Redis server
|
||||||
|
- After using emergency tools, the application may need to be restarted completely
|
5
check_state.py → tools/diagnostics/check_state.py
Normal file → Executable file
5
check_state.py → tools/diagnostics/check_state.py
Normal file → Executable file
@ -1,3 +1,8 @@
|
|||||||
|
#!/usr/bin/env python3
|
||||||
|
"""
|
||||||
|
Check the current state of the scraper (active/inactive, paused/unpaused)
|
||||||
|
"""
|
||||||
|
|
||||||
from scipaperloader.models import ScraperState
|
from scipaperloader.models import ScraperState
|
||||||
from scipaperloader import create_app
|
from scipaperloader import create_app
|
||||||
|
|
0
diagnose_scraper.py → tools/diagnostics/diagnose_scraper.py
Normal file → Executable file
0
diagnose_scraper.py → tools/diagnostics/diagnose_scraper.py
Normal file → Executable file
86
tools/diagnostics/diagnostic_menu.py
Executable file
86
tools/diagnostics/diagnostic_menu.py
Executable file
@ -0,0 +1,86 @@
|
|||||||
|
#!/usr/bin/env python3
|
||||||
|
"""
|
||||||
|
SciPaperLoader Diagnostic Menu
|
||||||
|
|
||||||
|
A simple menu-based interface to run common diagnostic operations.
|
||||||
|
"""
|
||||||
|
|
||||||
|
import os
|
||||||
|
import sys
|
||||||
|
import subprocess
|
||||||
|
from pathlib import Path
|
||||||
|
|
||||||
|
# Add project root to path
|
||||||
|
SCRIPT_DIR = Path(__file__).parent
|
||||||
|
PROJECT_ROOT = SCRIPT_DIR.parent.parent
|
||||||
|
sys.path.insert(0, str(PROJECT_ROOT))
|
||||||
|
|
||||||
|
def clear_screen():
|
||||||
|
"""Clear the terminal screen."""
|
||||||
|
os.system('cls' if os.name == 'nt' else 'clear')
|
||||||
|
|
||||||
|
def run_script(script_name):
|
||||||
|
"""Run a Python script and wait for it to complete."""
|
||||||
|
script_path = SCRIPT_DIR / script_name
|
||||||
|
print(f"\nRunning {script_name}...\n")
|
||||||
|
try:
|
||||||
|
# Add execute permission if needed
|
||||||
|
if not os.access(script_path, os.X_OK):
|
||||||
|
os.chmod(script_path, os.stat(script_path).st_mode | 0o100)
|
||||||
|
|
||||||
|
subprocess.run([sys.executable, str(script_path)], check=True)
|
||||||
|
except subprocess.CalledProcessError as e:
|
||||||
|
print(f"\nError running {script_name}: {e}")
|
||||||
|
except FileNotFoundError:
|
||||||
|
print(f"\nError: Script {script_name} not found at {script_path}")
|
||||||
|
|
||||||
|
input("\nPress Enter to continue...")
|
||||||
|
|
||||||
|
def main_menu():
|
||||||
|
"""Display the main menu and handle user selection."""
|
||||||
|
while True:
|
||||||
|
clear_screen()
|
||||||
|
print("=" * 50)
|
||||||
|
print(" SciPaperLoader Diagnostic Menu")
|
||||||
|
print("=" * 50)
|
||||||
|
print("1. Check scraper state")
|
||||||
|
print("2. Inspect running tasks")
|
||||||
|
print("3. Full diagnostic report")
|
||||||
|
print("4. Test paper reversion")
|
||||||
|
print("5. Emergency stop (stop all tasks)")
|
||||||
|
print("6. Quick fix (stop & restart workers)")
|
||||||
|
print("0. Exit")
|
||||||
|
print("=" * 50)
|
||||||
|
|
||||||
|
choice = input("Enter your choice (0-6): ")
|
||||||
|
|
||||||
|
if choice == "0":
|
||||||
|
clear_screen()
|
||||||
|
print("Exiting diagnostic menu.")
|
||||||
|
break
|
||||||
|
elif choice == "1":
|
||||||
|
run_script("check_state.py")
|
||||||
|
elif choice == "2":
|
||||||
|
run_script("inspect_tasks.py")
|
||||||
|
elif choice == "3":
|
||||||
|
run_script("diagnose_scraper.py")
|
||||||
|
elif choice == "4":
|
||||||
|
run_script("test_reversion.py")
|
||||||
|
elif choice == "5":
|
||||||
|
confirm = input("Are you sure you want to emergency stop all tasks? (y/n): ")
|
||||||
|
if confirm.lower() == 'y':
|
||||||
|
run_script("emergency_stop.py")
|
||||||
|
elif choice == "6":
|
||||||
|
confirm = input("Are you sure you want to stop all tasks and restart workers? (y/n): ")
|
||||||
|
if confirm.lower() == 'y':
|
||||||
|
run_script("quick_fix.py")
|
||||||
|
else:
|
||||||
|
print("\nInvalid choice. Press Enter to try again.")
|
||||||
|
input()
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
try:
|
||||||
|
main_menu()
|
||||||
|
except KeyboardInterrupt:
|
||||||
|
print("\nExiting due to user interrupt.")
|
||||||
|
sys.exit(0)
|
@ -17,7 +17,7 @@ import time
|
|||||||
from datetime import datetime
|
from datetime import datetime
|
||||||
|
|
||||||
# Add project root to path
|
# Add project root to path
|
||||||
sys.path.insert(0, os.path.abspath(os.path.dirname(__file__)))
|
sys.path.insert(0, os.path.abspath(os.path.join(os.path.dirname(__file__), '../..')))
|
||||||
|
|
||||||
# Import required modules
|
# Import required modules
|
||||||
from scipaperloader import create_app
|
from scipaperloader import create_app
|
11
tools/diagnostics/inspect_tasks.py
Executable file
11
tools/diagnostics/inspect_tasks.py
Executable file
@ -0,0 +1,11 @@
|
|||||||
|
#!/usr/bin/env python3
|
||||||
|
"""
|
||||||
|
Inspect current Celery tasks (active, reserved, and scheduled)
|
||||||
|
"""
|
||||||
|
|
||||||
|
from scipaperloader.celery import celery
|
||||||
|
|
||||||
|
i = celery.control.inspect()
|
||||||
|
print("Active tasks:", i.active())
|
||||||
|
print("Reserved tasks:", i.reserved())
|
||||||
|
print("Scheduled tasks:", i.scheduled())
|
103
tools/diagnostics/quick_fix.py
Executable file
103
tools/diagnostics/quick_fix.py
Executable file
@ -0,0 +1,103 @@
|
|||||||
|
#!/usr/bin/env python3
|
||||||
|
"""
|
||||||
|
Quick fix script to stop all running scraper tasks and restart Celery workers.
|
||||||
|
This ensures the updated code is loaded and tasks are properly terminated.
|
||||||
|
"""
|
||||||
|
|
||||||
|
import os
|
||||||
|
import sys
|
||||||
|
import signal
|
||||||
|
import subprocess
|
||||||
|
import time
|
||||||
|
from datetime import datetime
|
||||||
|
|
||||||
|
# Add project root to path
|
||||||
|
sys.path.insert(0, os.path.abspath(os.path.join(os.path.dirname(__file__), '../..')))
|
||||||
|
|
||||||
|
def kill_celery_processes():
|
||||||
|
"""Kill all running Celery processes"""
|
||||||
|
print("Killing Celery processes...")
|
||||||
|
try:
|
||||||
|
# Get all celery processes
|
||||||
|
result = subprocess.run(['pgrep', '-f', 'celery'], capture_output=True, text=True)
|
||||||
|
if result.returncode == 0:
|
||||||
|
pids = result.stdout.strip().split('\n')
|
||||||
|
for pid in pids:
|
||||||
|
if pid:
|
||||||
|
try:
|
||||||
|
os.kill(int(pid), signal.SIGTERM)
|
||||||
|
print(f" Killed process {pid}")
|
||||||
|
except ProcessLookupError:
|
||||||
|
pass # Process already dead
|
||||||
|
|
||||||
|
# Wait a moment for graceful shutdown
|
||||||
|
time.sleep(2)
|
||||||
|
|
||||||
|
# Force kill any remaining processes
|
||||||
|
result = subprocess.run(['pgrep', '-f', 'celery'], capture_output=True, text=True)
|
||||||
|
if result.returncode == 0:
|
||||||
|
pids = result.stdout.strip().split('\n')
|
||||||
|
for pid in pids:
|
||||||
|
if pid:
|
||||||
|
try:
|
||||||
|
os.kill(int(pid), signal.SIGKILL)
|
||||||
|
print(f" Force killed process {pid}")
|
||||||
|
except ProcessLookupError:
|
||||||
|
pass
|
||||||
|
|
||||||
|
print("✓ All Celery processes terminated")
|
||||||
|
except Exception as e:
|
||||||
|
print(f"⚠ Error killing processes: {e}")
|
||||||
|
|
||||||
|
def stop_scraper_state():
|
||||||
|
"""Set scraper state to inactive using Flask app context"""
|
||||||
|
try:
|
||||||
|
from scipaperloader import create_app
|
||||||
|
from scipaperloader.models import ScraperState, PaperMetadata
|
||||||
|
from scipaperloader.db import db
|
||||||
|
|
||||||
|
app = create_app()
|
||||||
|
with app.app_context():
|
||||||
|
# Set scraper to inactive
|
||||||
|
ScraperState.set_active(False)
|
||||||
|
ScraperState.set_paused(False)
|
||||||
|
print("✓ Set scraper state to inactive")
|
||||||
|
|
||||||
|
# Revert any pending papers to "New" status (simple approach since we don't have previous_status data yet)
|
||||||
|
pending_papers = PaperMetadata.query.filter_by(status="Pending").all()
|
||||||
|
reverted_count = 0
|
||||||
|
|
||||||
|
for paper in pending_papers:
|
||||||
|
paper.status = "New" # Simple fallback - revert all to "New"
|
||||||
|
reverted_count += 1
|
||||||
|
|
||||||
|
if reverted_count > 0:
|
||||||
|
db.session.commit()
|
||||||
|
print(f"✓ Reverted {reverted_count} papers from 'Pending' to 'New'")
|
||||||
|
else:
|
||||||
|
print("✓ No pending papers to revert")
|
||||||
|
|
||||||
|
except Exception as e:
|
||||||
|
print(f"⚠ Error setting scraper state: {e}")
|
||||||
|
|
||||||
|
def main():
|
||||||
|
print("=== QUICK SCRAPER FIX ===")
|
||||||
|
print(f"Time: {datetime.now()}")
|
||||||
|
print()
|
||||||
|
|
||||||
|
# Step 1: Stop scraper state
|
||||||
|
stop_scraper_state()
|
||||||
|
|
||||||
|
# Step 2: Kill all Celery processes
|
||||||
|
kill_celery_processes()
|
||||||
|
|
||||||
|
print()
|
||||||
|
print("=== FIX COMPLETE ===")
|
||||||
|
print("The scraper has been stopped and all tasks terminated.")
|
||||||
|
print("You can now restart the Celery workers with:")
|
||||||
|
print(" make celery")
|
||||||
|
print("or")
|
||||||
|
print(" make run")
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
main()
|
101
tools/diagnostics/test_reversion.py
Executable file
101
tools/diagnostics/test_reversion.py
Executable file
@ -0,0 +1,101 @@
|
|||||||
|
#!/usr/bin/env python3
|
||||||
|
"""
|
||||||
|
Test script for verifying the paper reversion fix.
|
||||||
|
This script:
|
||||||
|
1. Simulates stopping the scraper
|
||||||
|
2. Checks that all pending papers were reverted to their previous status
|
||||||
|
3. Ensures all running tasks were terminated
|
||||||
|
"""
|
||||||
|
|
||||||
|
import os
|
||||||
|
import sys
|
||||||
|
import time
|
||||||
|
from datetime import datetime
|
||||||
|
from sqlalchemy import func
|
||||||
|
from flask import Flask
|
||||||
|
|
||||||
|
# Add project root to path
|
||||||
|
sys.path.insert(0, os.path.abspath(os.path.join(os.path.dirname(__file__), '../..')))
|
||||||
|
|
||||||
|
# Import the app and models
|
||||||
|
from scipaperloader import create_app
|
||||||
|
from scipaperloader.db import db
|
||||||
|
from scipaperloader.models import PaperMetadata, ActivityLog, ScraperState
|
||||||
|
from scipaperloader.celery import celery
|
||||||
|
|
||||||
|
app = create_app()
|
||||||
|
|
||||||
|
def test_stop_scraper():
|
||||||
|
"""Test the stop_scraper functionality"""
|
||||||
|
|
||||||
|
with app.app_context():
|
||||||
|
# First check current scraper state
|
||||||
|
scraper_state = ScraperState.get_current_state()
|
||||||
|
print(f"Current scraper state: active={scraper_state.is_active}, paused={scraper_state.is_paused}")
|
||||||
|
|
||||||
|
# Check if there are any papers in "Pending" state
|
||||||
|
pending_count = PaperMetadata.query.filter_by(status="Pending").count()
|
||||||
|
print(f"Papers in 'Pending' state before stopping: {pending_count}")
|
||||||
|
|
||||||
|
if pending_count == 0:
|
||||||
|
print("No papers in 'Pending' state to test with.")
|
||||||
|
print("Would you like to create a test paper in Pending state? (y/n)")
|
||||||
|
choice = input().lower()
|
||||||
|
if choice == 'y':
|
||||||
|
# Create a test paper
|
||||||
|
paper = PaperMetadata(
|
||||||
|
title="Test Paper for Reversion",
|
||||||
|
doi="10.1234/test.123",
|
||||||
|
status="Pending",
|
||||||
|
previous_status="New", # Test value we expect to be reverted to
|
||||||
|
created_at=datetime.utcnow(),
|
||||||
|
updated_at=datetime.utcnow()
|
||||||
|
)
|
||||||
|
db.session.add(paper)
|
||||||
|
db.session.commit()
|
||||||
|
print(f"Created test paper with ID {paper.id}, status='Pending', previous_status='New'")
|
||||||
|
pending_count = 1
|
||||||
|
|
||||||
|
# Simulate the stop_scraper API call
|
||||||
|
from scipaperloader.blueprints.scraper import revert_pending_papers
|
||||||
|
print("Reverting pending papers...")
|
||||||
|
reverted = revert_pending_papers()
|
||||||
|
print(f"Reverted {reverted} papers from 'Pending' state")
|
||||||
|
|
||||||
|
# Check if any papers are still in "Pending" state
|
||||||
|
still_pending = PaperMetadata.query.filter_by(status="Pending").count()
|
||||||
|
print(f"Papers still in 'Pending' state after stopping: {still_pending}")
|
||||||
|
|
||||||
|
# List any that were reverted and their current status
|
||||||
|
if reverted > 0:
|
||||||
|
print("\nPapers that were reverted:")
|
||||||
|
recent_logs = ActivityLog.query.filter_by(action="revert_pending").order_by(
|
||||||
|
ActivityLog.timestamp.desc()).limit(10).all()
|
||||||
|
|
||||||
|
for log in recent_logs:
|
||||||
|
paper = PaperMetadata.query.get(log.paper_id)
|
||||||
|
if paper:
|
||||||
|
print(f"Paper ID {paper.id}: '{paper.title}' - Now status='{paper.status}'")
|
||||||
|
|
||||||
|
# Check active celery tasks
|
||||||
|
i = celery.control.inspect()
|
||||||
|
active = i.active() or {}
|
||||||
|
reserved = i.reserved() or {}
|
||||||
|
scheduled = i.scheduled() or {}
|
||||||
|
|
||||||
|
active_count = sum(len(tasks) for worker, tasks in active.items())
|
||||||
|
reserved_count = sum(len(tasks) for worker, tasks in reserved.items())
|
||||||
|
scheduled_count = sum(len(tasks) for worker, tasks in scheduled.items())
|
||||||
|
|
||||||
|
print(f"\nCurrently {active_count} active, {reserved_count} reserved, and {scheduled_count} scheduled tasks")
|
||||||
|
|
||||||
|
# Print conclusion
|
||||||
|
if still_pending == 0 and reverted > 0:
|
||||||
|
print("\nSUCCESS: All pending papers were properly reverted!")
|
||||||
|
elif still_pending > 0:
|
||||||
|
print(f"\nWARNING: {still_pending} papers are still in 'Pending' state!")
|
||||||
|
elif pending_count == 0 and reverted == 0:
|
||||||
|
print("\nNo papers to revert. Can't fully test.")
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
test_stop_scraper()
|
24
tools/run-diagnostics.sh
Executable file
24
tools/run-diagnostics.sh
Executable file
@ -0,0 +1,24 @@
|
|||||||
|
#!/bin/bash
|
||||||
|
# Simple launcher for diagnostic menu
|
||||||
|
# This script works regardless of virtualenv status
|
||||||
|
|
||||||
|
# Find the directory where this script is located
|
||||||
|
SCRIPT_DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" &> /dev/null && pwd )"
|
||||||
|
PROJECT_ROOT="$( cd "$SCRIPT_DIR/.." &> /dev/null && pwd )"
|
||||||
|
MENU_SCRIPT="$SCRIPT_DIR/diagnostics/diagnostic_menu.py"
|
||||||
|
|
||||||
|
# Go to project root to ensure correct paths
|
||||||
|
cd "$PROJECT_ROOT"
|
||||||
|
|
||||||
|
# Check if the diagnostic menu exists
|
||||||
|
if [ ! -f "$MENU_SCRIPT" ]; then
|
||||||
|
echo "Error: Diagnostic menu not found at $MENU_SCRIPT"
|
||||||
|
echo "Please make sure you're running this script from the project root directory."
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
|
|
||||||
|
# Make sure the script is executable
|
||||||
|
chmod +x "$MENU_SCRIPT"
|
||||||
|
|
||||||
|
# Run the diagnostic menu
|
||||||
|
python "$MENU_SCRIPT"
|
53
tools/run_diagnostics.py
Executable file
53
tools/run_diagnostics.py
Executable file
@ -0,0 +1,53 @@
|
|||||||
|
#!/usr/bin/env python3
|
||||||
|
"""
|
||||||
|
SciPaperLoader Diagnostic Tools - One-click launcher
|
||||||
|
|
||||||
|
This script provides a simple way to launch the diagnostic menu.
|
||||||
|
"""
|
||||||
|
|
||||||
|
import os
|
||||||
|
import sys
|
||||||
|
import subprocess
|
||||||
|
from pathlib import Path
|
||||||
|
|
||||||
|
# Find the diagnostic_menu.py script
|
||||||
|
SCRIPT_DIR = Path(__file__).parent
|
||||||
|
MENU_SCRIPT = SCRIPT_DIR / "diagnostics" / "diagnostic_menu.py"
|
||||||
|
|
||||||
|
def main():
|
||||||
|
"""Run the diagnostic menu."""
|
||||||
|
# Check if diagnostics directory exists
|
||||||
|
diagnostics_dir = SCRIPT_DIR / "diagnostics"
|
||||||
|
if not diagnostics_dir.exists():
|
||||||
|
print(f"Error: Diagnostics directory not found at {diagnostics_dir}")
|
||||||
|
print("Make sure you're running this script from the project root directory.")
|
||||||
|
return 1
|
||||||
|
|
||||||
|
# Check if menu script exists
|
||||||
|
if not MENU_SCRIPT.exists():
|
||||||
|
print(f"Error: Diagnostic menu not found at {MENU_SCRIPT}")
|
||||||
|
print("Available files in the diagnostics directory:")
|
||||||
|
try:
|
||||||
|
for file in diagnostics_dir.glob("*.py"):
|
||||||
|
print(f" - {file.name}")
|
||||||
|
except Exception:
|
||||||
|
print(" Could not list files in the diagnostics directory.")
|
||||||
|
return 1
|
||||||
|
|
||||||
|
# Ensure the script is executable
|
||||||
|
if not os.access(MENU_SCRIPT, os.X_OK):
|
||||||
|
try:
|
||||||
|
os.chmod(MENU_SCRIPT, os.stat(MENU_SCRIPT).st_mode | 0o100)
|
||||||
|
except Exception as e:
|
||||||
|
print(f"Warning: Could not make script executable: {e}")
|
||||||
|
|
||||||
|
# Run the menu script
|
||||||
|
try:
|
||||||
|
subprocess.run([sys.executable, str(MENU_SCRIPT)], check=True)
|
||||||
|
return 0
|
||||||
|
except subprocess.CalledProcessError as e:
|
||||||
|
print(f"Error running diagnostic menu: {e}")
|
||||||
|
return 1
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
sys.exit(main())
|
Loading…
x
Reference in New Issue
Block a user