Compare commits
3 Commits
f1d93a244e
...
e15867c9a6
Author | SHA1 | Date | |
---|---|---|---|
e15867c9a6 | |||
b09c6f1b9b | |||
6e119f1412 |
32
DEVELOPMENT
32
DEVELOPMENT
@ -1,3 +1,35 @@
|
||||
## Directory Structure
|
||||
|
||||
Below is the directory and file layout for the `scipaperloader` project:
|
||||
|
||||
```plaintext
|
||||
scipaperloader/
|
||||
├── app/
|
||||
│ ├── __init__.py # Initialize Flask app and database
|
||||
│ ├── models.py # SQLAlchemy database models
|
||||
│ ├── main.py # Flask routes (main blueprint)
|
||||
│ ├── templates/ # Jinja2 templates for HTML pages
|
||||
│ │ ├── base.html # Base layout template with Alpine.js and HTMX
|
||||
│ │ ├── index.html # Home page template
|
||||
│ │ ├── upload.html # CSV upload page template
|
||||
│ │ ├── schedule.html # Schedule configuration page template
|
||||
│ │ └── logs.html # Logs display page template
|
||||
│ └── static/ # Static files (CSS, JS, images)
|
||||
├── scraper.py # Background scraper daemon script
|
||||
├── tests/
|
||||
│ └── test_scipaperloader.py # Tests with a Flask test fixture
|
||||
├── config.py # Configuration settings for different environments
|
||||
├── pyproject.toml # Project metadata and build configuration
|
||||
├── setup.cfg # Development tool configurations (linting, testing)
|
||||
├── Makefile # Convenient commands for development tasks
|
||||
└── .venv/ # Python virtual environment (not in version control)
|
||||
```
|
||||
|
||||
- The **`app/`** package contains the Flask application code. It includes an `__init__.py` to create the app and set up extensions, a `models.py` defining database models with SQLAlchemy, and a `main.py` defining routes in a Flask Blueprint. The `templates/` directory holds HTML templates (with Jinja2 syntax) and `static/` will contain static assets (e.g., custom CSS or JS files, if any).
|
||||
- The **`scraper.py`** is a **standalone** Python script acting as a background daemon. It can be run separately to perform background scraping tasks (e.g., periodically fetching new data). This script will use the same database (via SQLAlchemy models or direct database access) to read or write data as needed.
|
||||
- The **`tests/`** directory includes a test file that uses pytest to ensure the Flask app and its components work as expected. A Flask fixture creates an application instance for testing (with an in-memory database) and verifies routes and database operations (e.g., uploading CSV adds records).
|
||||
- The **configuration and setup files** at the project root help in development and deployment. `config.py` defines configuration classes (for development, testing, production) so the app can be easily configured. `pyproject.toml` and `setup.cfg` provide project metadata and tool configurations (for packaging, linting, etc.), and a `Makefile` is included to simplify common tasks (running the app, tests, etc.).
|
||||
|
||||
## How to use the logger
|
||||
|
||||
### GUI Interactions:
|
||||
|
22
README.md
22
README.md
@ -82,6 +82,28 @@ The following environment variables can be set to configure Celery:
|
||||
Consider using
|
||||
[dotenv](https://flask.palletsprojects.com/en/3.0.x/cli/#environment-variables-from-dotenv).
|
||||
|
||||
## Database Migrations with Flask-Migrate
|
||||
|
||||
SciPaperLoader uses Flask-Migrate (based on Alembic) to handle database schema changes. This allows for version-controlled database updates that can be applied or rolled back as needed.
|
||||
|
||||
### Database Migration Commands
|
||||
|
||||
- `make db-migrate message="Description of changes"`: Create a new migration script based on detected model changes
|
||||
- `make db-upgrade`: Apply all pending migration scripts to the database
|
||||
- `make db-downgrade`: Revert the most recent migration
|
||||
- `make reset-db`: Reset the database completely (delete, initialize, and migrate)
|
||||
|
||||
### Working with Migrations
|
||||
|
||||
When you make changes to the database models (in `models.py`):
|
||||
|
||||
1. Create a migration: `make db-migrate message="Add user roles table"`
|
||||
2. Review the generated migration script in the `migrations/versions/` directory
|
||||
3. Apply the migration: `make db-upgrade`
|
||||
4. To roll back a problematic migration: `make db-downgrade`
|
||||
|
||||
Always create database backups before applying migrations in production using `make backup-db`.
|
||||
|
||||
## Deployment
|
||||
|
||||
See [Deploying to Production](https://flask.palletsprojects.com/en/3.0.x/deploying/).
|
||||
|
@ -18,7 +18,7 @@ def create_app(test_config=None):
|
||||
app.config.update(test_config)
|
||||
|
||||
db.init_app(app)
|
||||
migrate = Migrate(app, db) # Add this line to initialize Flask-Migrate
|
||||
migrate = Migrate(app, db)
|
||||
|
||||
with app.app_context():
|
||||
db.create_all()
|
||||
|
@ -6,6 +6,8 @@ from .papers import bp as papers_bp
|
||||
from .upload import bp as upload_bp
|
||||
from .schedule import bp as schedule_bp
|
||||
from .logger import bp as logger_bp
|
||||
from .api import bp as api_bp
|
||||
from .scraper import bp as scraper_bp
|
||||
|
||||
|
||||
def register_blueprints(app: Flask):
|
||||
@ -14,4 +16,6 @@ def register_blueprints(app: Flask):
|
||||
app.register_blueprint(papers_bp, url_prefix='/papers')
|
||||
app.register_blueprint(upload_bp, url_prefix='/upload')
|
||||
app.register_blueprint(schedule_bp, url_prefix='/schedule')
|
||||
app.register_blueprint(logger_bp, url_prefix='/logs')
|
||||
app.register_blueprint(logger_bp, url_prefix='/logs')
|
||||
app.register_blueprint(api_bp, url_prefix='/api')
|
||||
app.register_blueprint(scraper_bp, url_prefix='/scraper')
|
50
scipaperloader/blueprints/api.py
Normal file
50
scipaperloader/blueprints/api.py
Normal file
@ -0,0 +1,50 @@
|
||||
from datetime import datetime
|
||||
from flask import Blueprint, jsonify, request
|
||||
from ..models import ActivityLog, ActivityCategory
|
||||
|
||||
bp = Blueprint("api", __name__, url_prefix="/api")
|
||||
|
||||
@bp.route("/activity_logs")
|
||||
def get_activity_logs():
|
||||
"""Get activity logs with filtering options."""
|
||||
# Get query parameters
|
||||
category = request.args.get("category")
|
||||
action = request.args.get("action")
|
||||
after = request.args.get("after")
|
||||
limit = request.args.get("limit", 20, type=int)
|
||||
|
||||
# Build query
|
||||
query = ActivityLog.query
|
||||
|
||||
if category:
|
||||
query = query.filter(ActivityLog.category == category)
|
||||
|
||||
if action:
|
||||
query = query.filter(ActivityLog.action == action)
|
||||
|
||||
if after:
|
||||
try:
|
||||
after_date = datetime.fromisoformat(after.replace("Z", "+00:00"))
|
||||
query = query.filter(ActivityLog.timestamp > after_date)
|
||||
except (ValueError, TypeError):
|
||||
pass
|
||||
|
||||
# Order by most recent first and limit results
|
||||
logs = query.order_by(ActivityLog.timestamp.desc()).limit(limit).all()
|
||||
|
||||
# Format the results
|
||||
result = []
|
||||
for log in logs:
|
||||
log_data = {
|
||||
"id": log.id,
|
||||
"timestamp": log.timestamp.isoformat(),
|
||||
"category": log.category,
|
||||
"action": log.action,
|
||||
"description": log.description,
|
||||
"status": log.status,
|
||||
"paper_id": log.paper_id,
|
||||
"extra_data": log.extra_data
|
||||
}
|
||||
result.append(log_data)
|
||||
|
||||
return jsonify(result)
|
512
scipaperloader/blueprints/scraper.py
Normal file
512
scipaperloader/blueprints/scraper.py
Normal file
@ -0,0 +1,512 @@
|
||||
import random
|
||||
import json
|
||||
from datetime import datetime
|
||||
from flask import Blueprint, jsonify, render_template, request, current_app, flash
|
||||
from ..models import ScheduleConfig, VolumeConfig, ActivityLog, PaperMetadata, ActivityCategory
|
||||
from ..db import db
|
||||
from ..celery import celery
|
||||
|
||||
bp = Blueprint("scraper", __name__, url_prefix="/scraper")
|
||||
|
||||
# Global variables to track scraper state
|
||||
SCRAPER_ACTIVE = False
|
||||
SCRAPER_PAUSED = False
|
||||
|
||||
@bp.route("/")
|
||||
def index():
|
||||
"""Render the scraper control panel."""
|
||||
volume_config = VolumeConfig.query.first()
|
||||
|
||||
# Ensure we have volume config
|
||||
if not volume_config:
|
||||
volume_config = VolumeConfig(volume=100) # Default value
|
||||
db.session.add(volume_config)
|
||||
db.session.commit()
|
||||
|
||||
# Ensure we have schedule config for all hours
|
||||
existing_hours = {record.hour: record for record in ScheduleConfig.query.all()}
|
||||
schedule_config = {}
|
||||
|
||||
for hour in range(24):
|
||||
if hour in existing_hours:
|
||||
schedule_config[hour] = existing_hours[hour].weight
|
||||
else:
|
||||
# Create default schedule entry (weight 1.0)
|
||||
new_config = ScheduleConfig(hour=hour, weight=1.0)
|
||||
db.session.add(new_config)
|
||||
schedule_config[hour] = 1.0
|
||||
|
||||
if len(existing_hours) < 24:
|
||||
db.session.commit()
|
||||
|
||||
return render_template(
|
||||
"scraper.html.jinja",
|
||||
volume_config=volume_config,
|
||||
schedule_config=schedule_config,
|
||||
scraper_active=SCRAPER_ACTIVE,
|
||||
scraper_paused=SCRAPER_PAUSED
|
||||
)
|
||||
|
||||
@bp.route("/start", methods=["POST"])
|
||||
def start_scraper():
|
||||
"""Start the scraper."""
|
||||
global SCRAPER_ACTIVE, SCRAPER_PAUSED
|
||||
|
||||
if not SCRAPER_ACTIVE:
|
||||
SCRAPER_ACTIVE = True
|
||||
SCRAPER_PAUSED = False
|
||||
|
||||
# Log the action
|
||||
ActivityLog.log_scraper_command(
|
||||
action="start_scraper",
|
||||
status="success",
|
||||
description="Scraper started manually"
|
||||
)
|
||||
|
||||
# Start the scheduler task
|
||||
task = dummy_scraper_scheduler.delay()
|
||||
|
||||
return jsonify({
|
||||
"success": True,
|
||||
"message": "Scraper started",
|
||||
"task_id": task.id
|
||||
})
|
||||
else:
|
||||
return jsonify({
|
||||
"success": False,
|
||||
"message": "Scraper is already running"
|
||||
})
|
||||
|
||||
@bp.route("/stop", methods=["POST"])
|
||||
def stop_scraper():
|
||||
"""Stop the scraper."""
|
||||
global SCRAPER_ACTIVE, SCRAPER_PAUSED
|
||||
|
||||
if SCRAPER_ACTIVE:
|
||||
SCRAPER_ACTIVE = False
|
||||
SCRAPER_PAUSED = False
|
||||
|
||||
ActivityLog.log_scraper_command(
|
||||
action="stop_scraper",
|
||||
status="success",
|
||||
description="Scraper stopped manually"
|
||||
)
|
||||
|
||||
return jsonify({
|
||||
"success": True,
|
||||
"message": "Scraper stopped"
|
||||
})
|
||||
else:
|
||||
return jsonify({
|
||||
"success": False,
|
||||
"message": "Scraper is not running"
|
||||
})
|
||||
|
||||
@bp.route("/pause", methods=["POST"])
|
||||
def pause_scraper():
|
||||
"""Pause the scraper."""
|
||||
global SCRAPER_ACTIVE, SCRAPER_PAUSED
|
||||
|
||||
if SCRAPER_ACTIVE and not SCRAPER_PAUSED:
|
||||
SCRAPER_PAUSED = True
|
||||
|
||||
ActivityLog.log_scraper_command(
|
||||
action="pause_scraper",
|
||||
status="success",
|
||||
description="Scraper paused manually"
|
||||
)
|
||||
|
||||
return jsonify({
|
||||
"success": True,
|
||||
"message": "Scraper paused"
|
||||
})
|
||||
elif SCRAPER_ACTIVE and SCRAPER_PAUSED:
|
||||
SCRAPER_PAUSED = False
|
||||
|
||||
ActivityLog.log_scraper_command(
|
||||
action="resume_scraper",
|
||||
status="success",
|
||||
description="Scraper resumed manually"
|
||||
)
|
||||
|
||||
return jsonify({
|
||||
"success": True,
|
||||
"message": "Scraper resumed"
|
||||
})
|
||||
else:
|
||||
return jsonify({
|
||||
"success": False,
|
||||
"message": "Scraper is not running"
|
||||
})
|
||||
|
||||
@bp.route("/status")
|
||||
def scraper_status():
|
||||
"""Get the current status of the scraper."""
|
||||
return jsonify({
|
||||
"active": SCRAPER_ACTIVE,
|
||||
"paused": SCRAPER_PAUSED,
|
||||
"current_hour": datetime.now().hour,
|
||||
})
|
||||
|
||||
@bp.route("/stats")
|
||||
def scraper_stats():
|
||||
"""Get scraper statistics for the dashboard."""
|
||||
# Get the last 24 hours of activity
|
||||
hours = 24
|
||||
if request.args.get('hours'):
|
||||
try:
|
||||
hours = int(request.args.get('hours'))
|
||||
except ValueError:
|
||||
pass
|
||||
|
||||
cutoff_time = datetime.utcnow().replace(
|
||||
minute=0, second=0, microsecond=0
|
||||
)
|
||||
|
||||
# Get activity logs for scraper actions
|
||||
logs = ActivityLog.query.filter(
|
||||
ActivityLog.category == ActivityCategory.SCRAPER_ACTIVITY.value,
|
||||
ActivityLog.timestamp >= cutoff_time.replace(hour=cutoff_time.hour - hours)
|
||||
).all()
|
||||
|
||||
# Group by hour and status
|
||||
stats = {}
|
||||
for hour in range(hours):
|
||||
target_hour = (cutoff_time.hour - hour) % 24
|
||||
stats[target_hour] = {
|
||||
"success": 0,
|
||||
"error": 0,
|
||||
"pending": 0,
|
||||
"hour": target_hour,
|
||||
}
|
||||
|
||||
for log in logs:
|
||||
hour = log.timestamp.hour
|
||||
if hour in stats:
|
||||
if log.status == "success":
|
||||
stats[hour]["success"] += 1
|
||||
elif log.status == "error":
|
||||
stats[hour]["error"] += 1
|
||||
elif log.status == "pending":
|
||||
stats[hour]["pending"] += 1
|
||||
|
||||
# Convert to list for easier consumption by JavaScript
|
||||
result = [stats[hour] for hour in sorted(stats.keys())]
|
||||
|
||||
return jsonify(result)
|
||||
|
||||
@bp.route("/update_config", methods=["POST"])
|
||||
def update_config():
|
||||
"""Update scraper configuration."""
|
||||
data = request.json
|
||||
|
||||
try:
|
||||
if "volume" in data:
|
||||
try:
|
||||
new_volume = float(data["volume"])
|
||||
|
||||
# Validate volume value (from schedule.py)
|
||||
if new_volume <= 0 or new_volume > 1000:
|
||||
return jsonify({
|
||||
"success": False,
|
||||
"message": "Volume must be between 1 and 1000"
|
||||
})
|
||||
|
||||
volume_config = VolumeConfig.query.first()
|
||||
if not volume_config:
|
||||
volume_config = VolumeConfig(volume=new_volume)
|
||||
db.session.add(volume_config)
|
||||
else:
|
||||
old_value = volume_config.volume
|
||||
volume_config.volume = new_volume
|
||||
ActivityLog.log_config_change(
|
||||
config_key="scraper_volume",
|
||||
old_value=old_value,
|
||||
new_value=new_volume,
|
||||
description="Updated scraper volume"
|
||||
)
|
||||
|
||||
db.session.commit()
|
||||
except (ValueError, TypeError):
|
||||
return jsonify({
|
||||
"success": False,
|
||||
"message": "Invalid volume value"
|
||||
})
|
||||
|
||||
if "schedule" in data:
|
||||
try:
|
||||
schedule = data["schedule"]
|
||||
|
||||
# Validate entire schedule
|
||||
for hour_str, weight in schedule.items():
|
||||
try:
|
||||
hour = int(hour_str)
|
||||
weight = float(weight)
|
||||
|
||||
if hour < 0 or hour > 23:
|
||||
return jsonify({
|
||||
"success": False,
|
||||
"message": f"Hour value must be between 0 and 23, got {hour}"
|
||||
})
|
||||
|
||||
if weight < 0.1 or weight > 5:
|
||||
return jsonify({
|
||||
"success": False,
|
||||
"message": f"Weight for hour {hour} must be between 0.1 and 5, got {weight}"
|
||||
})
|
||||
except ValueError:
|
||||
return jsonify({
|
||||
"success": False,
|
||||
"message": f"Invalid data format for hour {hour_str}"
|
||||
})
|
||||
|
||||
# Update schedule after validation
|
||||
for hour_str, weight in schedule.items():
|
||||
hour = int(hour_str)
|
||||
weight = float(weight)
|
||||
|
||||
schedule_config = ScheduleConfig.query.get(hour)
|
||||
if not schedule_config:
|
||||
schedule_config = ScheduleConfig(hour=hour, weight=weight)
|
||||
db.session.add(schedule_config)
|
||||
else:
|
||||
old_value = schedule_config.weight
|
||||
schedule_config.weight = weight
|
||||
ActivityLog.log_config_change(
|
||||
config_key=f"schedule_hour_{hour}",
|
||||
old_value=old_value,
|
||||
new_value=weight,
|
||||
description=f"Updated schedule weight for hour {hour}"
|
||||
)
|
||||
|
||||
db.session.commit()
|
||||
except Exception as e:
|
||||
db.session.rollback()
|
||||
return jsonify({
|
||||
"success": False,
|
||||
"message": f"Error updating schedule: {str(e)}"
|
||||
})
|
||||
|
||||
return jsonify({"success": True, "message": "Configuration updated"})
|
||||
|
||||
except Exception as e:
|
||||
db.session.rollback()
|
||||
return jsonify({"success": False, "message": f"Unexpected error: {str(e)}"})
|
||||
|
||||
@bp.route("/schedule", methods=["GET", "POST"])
|
||||
def schedule():
|
||||
"""Legacy route to maintain compatibility with the schedule blueprint."""
|
||||
# For GET requests, redirect to the scraper index with the schedule tab active
|
||||
if request.method == "GET":
|
||||
return index()
|
||||
|
||||
# For POST requests, handle form data and process like the original schedule blueprint
|
||||
if request.method == "POST":
|
||||
try:
|
||||
# Check if we're updating volume or schedule
|
||||
if "total_volume" in request.form:
|
||||
# Volume update
|
||||
try:
|
||||
new_volume = float(request.form.get("total_volume", 0))
|
||||
if new_volume <= 0 or new_volume > 1000:
|
||||
raise ValueError("Volume must be between 1 and 1000")
|
||||
|
||||
volume_config = VolumeConfig.query.first()
|
||||
if not volume_config:
|
||||
volume_config = VolumeConfig(volume=new_volume)
|
||||
db.session.add(volume_config)
|
||||
else:
|
||||
volume_config.volume = new_volume
|
||||
|
||||
db.session.commit()
|
||||
flash("Volume updated successfully!", "success")
|
||||
|
||||
except ValueError as e:
|
||||
db.session.rollback()
|
||||
flash(f"Error updating volume: {str(e)}", "error")
|
||||
else:
|
||||
# Schedule update logic
|
||||
# Validate form data
|
||||
for hour in range(24):
|
||||
key = f"hour_{hour}"
|
||||
if key not in request.form:
|
||||
raise ValueError(f"Missing data for hour {hour}")
|
||||
|
||||
try:
|
||||
weight = float(request.form.get(key, 0))
|
||||
if weight < 0 or weight > 5:
|
||||
raise ValueError(
|
||||
f"Weight for hour {hour} must be between 0 and 5"
|
||||
)
|
||||
except ValueError:
|
||||
raise ValueError(f"Invalid weight value for hour {hour}")
|
||||
|
||||
# Update database if validation passes
|
||||
for hour in range(24):
|
||||
key = f"hour_{hour}"
|
||||
weight = float(request.form.get(key, 0))
|
||||
config = ScheduleConfig.query.get(hour)
|
||||
if config:
|
||||
config.weight = weight
|
||||
else:
|
||||
db.session.add(ScheduleConfig(hour=hour, weight=weight))
|
||||
|
||||
db.session.commit()
|
||||
flash("Schedule updated successfully!", "success")
|
||||
|
||||
except ValueError as e:
|
||||
db.session.rollback()
|
||||
flash(f"Error updating schedule: {str(e)}", "error")
|
||||
|
||||
# Redirect back to the scraper page
|
||||
return index()
|
||||
|
||||
# Calculate schedule information for visualization/decision making
|
||||
def get_schedule_stats():
|
||||
"""Get statistics about the current schedule configuration."""
|
||||
volume_config = VolumeConfig.query.first()
|
||||
if not volume_config:
|
||||
return {"error": "No volume configuration found"}
|
||||
|
||||
total_volume = volume_config.volume
|
||||
schedule_configs = ScheduleConfig.query.all()
|
||||
|
||||
if not schedule_configs:
|
||||
return {"error": "No schedule configuration found"}
|
||||
|
||||
# Calculate total weight
|
||||
total_weight = sum(config.weight for config in schedule_configs)
|
||||
|
||||
# Calculate papers per hour
|
||||
papers_per_hour = {}
|
||||
for config in schedule_configs:
|
||||
weight_ratio = config.weight / total_weight if total_weight > 0 else 0
|
||||
papers = weight_ratio * total_volume
|
||||
papers_per_hour[config.hour] = papers
|
||||
|
||||
return {
|
||||
"total_volume": total_volume,
|
||||
"total_weight": total_weight,
|
||||
"papers_per_hour": papers_per_hour
|
||||
}
|
||||
|
||||
# Enhanced API route to get schedule information
|
||||
@bp.route("/schedule_info")
|
||||
def schedule_info():
|
||||
"""Get information about the current schedule configuration."""
|
||||
stats = get_schedule_stats()
|
||||
return jsonify(stats)
|
||||
|
||||
# Define the Celery tasks
|
||||
@celery.task(bind=True)
|
||||
def dummy_scraper_scheduler(self):
|
||||
"""Main scheduler task for the dummy scraper."""
|
||||
global SCRAPER_ACTIVE, SCRAPER_PAUSED
|
||||
|
||||
if not SCRAPER_ACTIVE:
|
||||
return {"status": "Scraper not active"}
|
||||
|
||||
if SCRAPER_PAUSED:
|
||||
return {"status": "Scraper paused"}
|
||||
|
||||
# Calculate how many papers to scrape based on current hour and configuration
|
||||
current_hour = datetime.now().hour
|
||||
hour_config = ScheduleConfig.query.get(current_hour)
|
||||
volume_config = VolumeConfig.query.first()
|
||||
|
||||
if not hour_config or not volume_config:
|
||||
return {"status": "Missing configuration"}
|
||||
|
||||
# Calculate papers to scrape this hour
|
||||
hourly_rate = volume_config.volume / 24 # Base rate per hour
|
||||
adjusted_rate = hourly_rate * (1 / hour_config.weight) # Adjust by weight
|
||||
papers_to_scrape = int(adjusted_rate)
|
||||
|
||||
# Log the scheduling decision
|
||||
ActivityLog.log_scraper_activity(
|
||||
action="schedule_papers",
|
||||
status="success",
|
||||
description=f"Scheduled {papers_to_scrape} papers for scraping at hour {current_hour}",
|
||||
hourly_rate=hourly_rate,
|
||||
weight=hour_config.weight,
|
||||
adjusted_rate=adjusted_rate,
|
||||
)
|
||||
|
||||
# Launch individual scraping tasks
|
||||
for _ in range(papers_to_scrape):
|
||||
if not SCRAPER_ACTIVE or SCRAPER_PAUSED:
|
||||
break
|
||||
|
||||
# Schedule a new paper to be scraped
|
||||
dummy_scrape_paper.delay()
|
||||
|
||||
# Schedule the next run in 5 minutes if still active
|
||||
if SCRAPER_ACTIVE:
|
||||
dummy_scraper_scheduler.apply_async(countdown=300) # 5 minutes
|
||||
|
||||
return {"status": "success", "papers_scheduled": papers_to_scrape}
|
||||
|
||||
@celery.task(bind=True)
|
||||
def dummy_scrape_paper(self):
|
||||
"""Simulate scraping a single paper."""
|
||||
# Simulate success or failure
|
||||
success = random.random() > 0.3 # 70% success rate
|
||||
|
||||
# Simulate processing time
|
||||
import time
|
||||
time.sleep(random.randint(2, 5)) # 2-5 seconds
|
||||
|
||||
if success:
|
||||
# Create a dummy paper
|
||||
new_paper = PaperMetadata(
|
||||
title=f"Dummy Paper {random.randint(1000, 9999)}",
|
||||
doi=f"10.1234/dummy.{random.randint(1000, 9999)}",
|
||||
journal=random.choice([
|
||||
"Nature", "Science", "PLOS ONE", "Journal of Dummy Research",
|
||||
"Proceedings of the Dummy Society", "Cell", "Dummy Review Letters"
|
||||
]),
|
||||
type="article",
|
||||
language="en",
|
||||
published_online=datetime.now().date(),
|
||||
status="Done",
|
||||
file_path="/path/to/dummy/paper.pdf"
|
||||
)
|
||||
|
||||
db.session.add(new_paper)
|
||||
db.session.commit()
|
||||
|
||||
# Log the successful scrape
|
||||
ActivityLog.log_scraper_activity(
|
||||
action="scrape_paper",
|
||||
paper_id=new_paper.id,
|
||||
status="success",
|
||||
description=f"Successfully scraped paper {new_paper.doi}"
|
||||
)
|
||||
|
||||
return {
|
||||
"success": True,
|
||||
"paper_id": new_paper.id,
|
||||
"title": new_paper.title,
|
||||
"doi": new_paper.doi
|
||||
}
|
||||
else:
|
||||
# Log the failed scrape
|
||||
error_message = random.choice([
|
||||
"Connection timeout",
|
||||
"404 Not Found",
|
||||
"Access denied",
|
||||
"Invalid DOI format",
|
||||
"PDF download failed",
|
||||
"Rate limited by publisher"
|
||||
])
|
||||
|
||||
ActivityLog.log_scraper_activity(
|
||||
action="scrape_paper",
|
||||
status="error",
|
||||
description=f"Failed to scrape paper: {error_message}"
|
||||
)
|
||||
|
||||
return {
|
||||
"success": False,
|
||||
"error": error_message
|
||||
}
|
@ -7,6 +7,9 @@
|
||||
</button>
|
||||
<div class="collapse navbar-collapse" id="navbarSupportedContent">
|
||||
<ul class="navbar-nav me-auto mb-2 mb-lg-0">
|
||||
<li class="nav-item">
|
||||
<a class="nav-link" href="{{ url_for('scraper.index') }}">Scraper</a>
|
||||
</li>
|
||||
<li class="nav-item">
|
||||
<a class="nav-link" href="{{ url_for('upload.upload') }}">Import CSV</a>
|
||||
</li>
|
||||
|
@ -144,13 +144,13 @@
|
||||
</th>
|
||||
<th>
|
||||
{% set params = request.args.to_dict() %}
|
||||
{% set params = params.update({'sort_by': 'journal', 'sort_dir': journal_sort}) or params %}
|
||||
<a href="{{ url_for('papers.list_papers', **params) }}">Journal</a>
|
||||
{% set params = params.update({'sort_by': 'doi', 'sort_dir': doi_sort}) or params %}
|
||||
<a href="{{ url_for('papers.list_papers', **params) }}">DOI</a>
|
||||
</th>
|
||||
<th>
|
||||
{% set params = request.args.to_dict() %}
|
||||
{% set params = params.update({'sort_by': 'doi', 'sort_dir': doi_sort}) or params %}
|
||||
<a href="{{ url_for('papers.list_papers', **params) }}">DOI</a>
|
||||
{% set params = params.update({'sort_by': 'journal', 'sort_dir': journal_sort}) or params %}
|
||||
<a href="{{ url_for('papers.list_papers', **params) }}">Journal</a>
|
||||
</th>
|
||||
<th>
|
||||
{% set params = request.args.to_dict() %}
|
||||
@ -186,10 +186,9 @@
|
||||
<path
|
||||
d="M9.5 1a.5.5 0 0 1 .5.5v1a.5.5 0 0 1-.5.5h-3a.5.5 0 0 1-.5-.5v-1a.5.5 0 0 1 .5-.5h3zm-3-1A1.5 1.5 0 0 0 5 1.5v1A1.5 1.5 0 0 0 6.5 4h3A1.5 1.5 0 0 0 11 2.5v-1A1.5 1.5 0 0 0 9.5 0h-3z" />
|
||||
</svg>
|
||||
{{ paper.title }}
|
||||
{{ paper.title|escape }}
|
||||
</a>
|
||||
</td>
|
||||
<td>{{ paper.journal }}</td>
|
||||
<td>
|
||||
<a href="https://doi.org/{{ paper.doi }}" target="_blank" class="icon-link icon-link-hover">
|
||||
{{ paper.doi }}
|
||||
@ -199,7 +198,17 @@
|
||||
</svg>
|
||||
</a>
|
||||
</td>
|
||||
<td>{{ paper.issn }}</td>
|
||||
<td>{{ paper.journal }}</td>
|
||||
<td>
|
||||
<a href="https://search.worldcat.org/search?q=issn:{{ paper.issn }}" target="_blank"
|
||||
class="icon-link icon-link-hover">
|
||||
{{ paper.issn }}
|
||||
<svg xmlns="http://www.w3.org/2000/svg" class="bi" viewBox="0 0 16 16" aria-hidden="true">
|
||||
<path
|
||||
d="M1 8a.5.5 0 0 1 .5-.5h11.793l-3.147-3.146a.5.5 0 0 1 .708-.708l4 4a.5.5 0 0 1 0 .708l-4 4a.5.5 0 0 1-.708-.708L13.293 8.5H1.5A.5.5 0 0 1 1 8z" />
|
||||
</svg>
|
||||
</a>
|
||||
</td>
|
||||
<td>{{ paper.status }}</td>
|
||||
<td>{{ paper.created_at.strftime('%Y-%m-%d %H:%M:%S') }}</td>
|
||||
<td>{{ paper.updated_at.strftime('%Y-%m-%d %H:%M:%S') }}</td>
|
||||
|
755
scipaperloader/templates/scraper.html.jinja
Normal file
755
scipaperloader/templates/scraper.html.jinja
Normal file
@ -0,0 +1,755 @@
|
||||
{% extends "base.html.jinja" %}
|
||||
|
||||
{% block title %}Paper Scraper Control Panel{% endblock title %}
|
||||
|
||||
{% block styles %}
|
||||
{{ super() }}
|
||||
<style>
|
||||
.status-indicator {
|
||||
width: 15px;
|
||||
height: 15px;
|
||||
border-radius: 50%;
|
||||
display: inline-block;
|
||||
margin-right: 5px;
|
||||
}
|
||||
|
||||
.status-active {
|
||||
background-color: #28a745;
|
||||
}
|
||||
|
||||
.status-paused {
|
||||
background-color: #ffc107;
|
||||
}
|
||||
|
||||
.status-inactive {
|
||||
background-color: #dc3545;
|
||||
}
|
||||
|
||||
.stats-chart {
|
||||
height: 400px;
|
||||
}
|
||||
|
||||
.notification {
|
||||
position: fixed;
|
||||
bottom: 20px;
|
||||
right: 20px;
|
||||
max-width: 350px;
|
||||
z-index: 1050;
|
||||
}
|
||||
|
||||
/* Enhanced scheduler styles */
|
||||
.timeline {
|
||||
display: flex;
|
||||
flex-wrap: wrap;
|
||||
gap: 3px;
|
||||
user-select: none;
|
||||
}
|
||||
|
||||
.hour-block {
|
||||
width: 49px;
|
||||
height: 70px;
|
||||
border-radius: 5px;
|
||||
text-align: center;
|
||||
line-height: 1.2;
|
||||
font-size: 0.9rem;
|
||||
padding-top: 6px;
|
||||
cursor: pointer;
|
||||
user-select: none;
|
||||
transition: background-color 0.2s ease-in-out;
|
||||
margin: 1px;
|
||||
}
|
||||
|
||||
.hour-block.selected {
|
||||
outline: 2px solid #4584b8;
|
||||
}
|
||||
|
||||
.papers {
|
||||
font-size: 0.7rem;
|
||||
margin-top: 2px;
|
||||
}
|
||||
|
||||
/* Tab styles */
|
||||
.nav-tabs .nav-link {
|
||||
color: #495057;
|
||||
}
|
||||
|
||||
.nav-tabs .nav-link.active {
|
||||
font-weight: bold;
|
||||
color: #007bff;
|
||||
}
|
||||
|
||||
.tab-pane {
|
||||
padding-top: 1rem;
|
||||
}
|
||||
</style>
|
||||
{% endblock styles %}
|
||||
|
||||
{% block content %}
|
||||
<div class="container mt-4">
|
||||
<h1>Paper Scraper Control Panel</h1>
|
||||
|
||||
<!-- Navigation tabs -->
|
||||
<ul class="nav nav-tabs mb-4" id="scraperTabs" role="tablist">
|
||||
<li class="nav-item" role="presentation">
|
||||
<button class="nav-link active" id="dashboard-tab" data-bs-toggle="tab" data-bs-target="#dashboard"
|
||||
type="button" role="tab" aria-controls="dashboard" aria-selected="true">
|
||||
Dashboard
|
||||
</button>
|
||||
</li>
|
||||
<li class="nav-item" role="presentation">
|
||||
<button class="nav-link" id="schedule-tab" data-bs-toggle="tab" data-bs-target="#schedule" type="button"
|
||||
role="tab" aria-controls="schedule" aria-selected="false">
|
||||
Schedule Configuration
|
||||
</button>
|
||||
</li>
|
||||
</ul>
|
||||
|
||||
<div class="tab-content" id="scraperTabsContent">
|
||||
<!-- Dashboard Tab -->
|
||||
<div class="tab-pane fade show active" id="dashboard" role="tabpanel" aria-labelledby="dashboard-tab">
|
||||
<div class="row mb-4">
|
||||
<div class="col-md-6">
|
||||
<div class="card">
|
||||
<div class="card-header">
|
||||
<h5>Scraper Status</h5>
|
||||
</div>
|
||||
<div class="card-body">
|
||||
<div class="d-flex align-items-center mb-3">
|
||||
<div id="statusIndicator" class="status-indicator status-inactive"></div>
|
||||
<span id="statusText">Inactive</span>
|
||||
</div>
|
||||
|
||||
<div class="btn-group" role="group">
|
||||
<button id="startButton" class="btn btn-success">Start</button>
|
||||
<button id="pauseButton" class="btn btn-warning" disabled>Pause</button>
|
||||
<button id="stopButton" class="btn btn-danger" disabled>Stop</button>
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
<div class="col-md-6">
|
||||
<div class="card">
|
||||
<div class="card-header">
|
||||
<h5>Volume Configuration</h5>
|
||||
</div>
|
||||
<div class="card-body">
|
||||
<form id="volumeForm">
|
||||
<div class="form-group">
|
||||
<label for="volumeInput">Papers per day:</label>
|
||||
<input type="number" class="form-control" id="volumeInput"
|
||||
value="{{ volume_config.volume if volume_config else 100 }}">
|
||||
</div>
|
||||
<button type="submit" class="btn btn-primary mt-2">Update Volume</button>
|
||||
</form>
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
<div class="row mb-4">
|
||||
<div class="col-12">
|
||||
<div class="card">
|
||||
<div class="card-header d-flex justify-content-between align-items-center">
|
||||
<h5>Scraping Activity</h5>
|
||||
<div>
|
||||
<div class="form-check form-switch">
|
||||
<input class="form-check-input" type="checkbox" id="notificationsToggle" checked>
|
||||
<label class="form-check-label" for="notificationsToggle">Show Notifications</label>
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
<div class="card-body">
|
||||
<div class="btn-group mb-3">
|
||||
<button class="btn btn-outline-secondary time-range-btn" data-hours="6">Last 6
|
||||
hours</button>
|
||||
<button class="btn btn-outline-secondary time-range-btn active" data-hours="24">Last 24
|
||||
hours</button>
|
||||
<button class="btn btn-outline-secondary time-range-btn" data-hours="72">Last 3
|
||||
days</button>
|
||||
</div>
|
||||
<div class="stats-chart" id="activityChart"></div>
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
<div class="row mb-4">
|
||||
<div class="col-12">
|
||||
<div class="card">
|
||||
<div class="card-header">
|
||||
<h5>Recent Activity</h5>
|
||||
</div>
|
||||
<div class="card-body">
|
||||
<div class="table-responsive">
|
||||
<table class="table table-striped">
|
||||
<thead>
|
||||
<tr>
|
||||
<th>Time</th>
|
||||
<th>Action</th>
|
||||
<th>Status</th>
|
||||
<th>Description</th>
|
||||
</tr>
|
||||
</thead>
|
||||
<tbody id="activityLog">
|
||||
<tr>
|
||||
<td colspan="4" class="text-center">Loading activities...</td>
|
||||
</tr>
|
||||
</tbody>
|
||||
</table>
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
<!-- Schedule Configuration Tab -->
|
||||
<div class="tab-pane fade" id="schedule" role="tabpanel" aria-labelledby="schedule-tab"
|
||||
x-data="scheduleManager({{ schedule_config | tojson }}, {{ volume_config.volume if volume_config else 100 }})">
|
||||
|
||||
<div class="mb-3">
|
||||
<h3>How it Works</h3>
|
||||
<p class="text-muted mb-0">
|
||||
Configure the daily volume of papers to be downloaded and the hourly download weights.
|
||||
The weights determine how many papers will be downloaded during each hour of the day.
|
||||
The total volume (<strong x-text="volume"></strong> papers/day) is split across all hours based on
|
||||
their relative weights.
|
||||
<strong>Lower weights result in higher scraping rates</strong> for that hour.
|
||||
</p>
|
||||
<h5 class="mt-3">Instructions:</h5>
|
||||
<p class="text-muted">
|
||||
Click to select one or more hours below. Then assign a weight to them using the input and apply it.
|
||||
Color indicates relative intensity. Changes are saved immediately when you click "Update Schedule".
|
||||
</p>
|
||||
</div>
|
||||
|
||||
<div class="card mb-4">
|
||||
<div class="card-header">
|
||||
<h4 class="m-0">Volume Configuration</h4>
|
||||
</div>
|
||||
<div class="card-body">
|
||||
<p class="text-muted">
|
||||
The total volume of data to be downloaded each day is
|
||||
<strong x-text="volume"></strong> papers.
|
||||
</p>
|
||||
<div class="d-flex align-items-center mb-3">
|
||||
<div class="input-group">
|
||||
<span class="input-group-text">Papers per day:</span>
|
||||
<input type="number" class="form-control" x-model="volume" min="1" max="1000" />
|
||||
<button type="button" class="btn btn-primary" @click="updateVolume()">
|
||||
Update Volume
|
||||
</button>
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
<div class="card">
|
||||
<div class="card-header">
|
||||
<h4 class="m-0">Hourly Weights</h4>
|
||||
</div>
|
||||
<div class="card-body">
|
||||
<div class="timeline mb-3" @mouseup="endDrag()" @mouseleave="endDrag()">
|
||||
<template x-for="hour in Object.keys(schedule)" :key="hour">
|
||||
<div class="hour-block" :id="'hour-' + hour" :data-hour="hour"
|
||||
:style="getBackgroundStyle(hour)" :class="{'selected': isSelected(hour)}"
|
||||
@mousedown="startDrag($event, hour)" @mouseover="dragSelect(hour)">
|
||||
<div><strong x-text="formatHour(hour)"></strong></div>
|
||||
<div class="weight"><span x-text="schedule[hour]"></span></div>
|
||||
<div class="papers">
|
||||
<span x-text="getPapersPerHour(hour)"></span> p.
|
||||
</div>
|
||||
</div>
|
||||
</template>
|
||||
</div>
|
||||
|
||||
<div class="input-group mb-4 w-50">
|
||||
<span class="input-group-text">Set Weight:</span>
|
||||
<input type="number" step="0.1" min="0.1" max="5" x-model="newWeight" class="form-control" />
|
||||
<button type="button" class="btn btn-outline-primary" @click="applyWeight()">
|
||||
Apply to Selected
|
||||
</button>
|
||||
</div>
|
||||
|
||||
<button type="button" class="btn btn-success" @click="updateSchedule()">
|
||||
💾 Update Schedule
|
||||
</button>
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
<!-- Notification template -->
|
||||
<div id="notificationContainer"></div>
|
||||
{% endblock content %}
|
||||
|
||||
{% block scripts %}
|
||||
{{ super() }}
|
||||
<script src="https://cdn.jsdelivr.net/npm/chart.js"></script>
|
||||
<script src="https://cdn.jsdelivr.net/npm/alpinejs@3.x.x/dist/cdn.min.js" defer></script>
|
||||
<script>
|
||||
// Alpine.js scheduler component
|
||||
function scheduleManager(initial, volume) {
|
||||
return {
|
||||
schedule: initial || {},
|
||||
volume: volume,
|
||||
selectedHours: [],
|
||||
newWeight: 1.0,
|
||||
isDragging: false,
|
||||
dragOperation: null,
|
||||
|
||||
formatHour(h) {
|
||||
return String(h).padStart(2, "0") + ":00";
|
||||
},
|
||||
|
||||
getBackgroundStyle(hour) {
|
||||
const weight = parseFloat(this.schedule[hour]);
|
||||
const maxWeight = 2.5; // You can adjust this
|
||||
|
||||
// Normalize weight (0.0 to 1.0)
|
||||
const t = Math.min(weight / maxWeight, 1.0);
|
||||
|
||||
// Interpolate HSL lightness: 95% (light) to 30% (dark)
|
||||
const lightness = 95 - t * 65; // 95 → 30
|
||||
const backgroundColor = `hsl(210, 10%, ${lightness}%)`;
|
||||
|
||||
const textColor = t > 0.65 ? "white" : "black"; // adaptive text color
|
||||
|
||||
return {
|
||||
backgroundColor,
|
||||
color: textColor,
|
||||
};
|
||||
},
|
||||
|
||||
startDrag(event, hour) {
|
||||
event.preventDefault();
|
||||
this.isDragging = true;
|
||||
this.dragOperation = this.isSelected(hour) ? "remove" : "add";
|
||||
this.toggleSelect(hour);
|
||||
},
|
||||
|
||||
dragSelect(hour) {
|
||||
if (!this.isDragging) return;
|
||||
const selected = this.isSelected(hour);
|
||||
if (this.dragOperation === "add" && !selected) {
|
||||
this.selectedHours.push(hour);
|
||||
} else if (this.dragOperation === "remove" && selected) {
|
||||
this.selectedHours = this.selectedHours.filter((h) => h !== hour);
|
||||
}
|
||||
},
|
||||
|
||||
endDrag() {
|
||||
this.isDragging = false;
|
||||
},
|
||||
|
||||
toggleSelect(hour) {
|
||||
if (this.isSelected(hour)) {
|
||||
this.selectedHours = this.selectedHours.filter((h) => h !== hour);
|
||||
} else {
|
||||
this.selectedHours.push(hour);
|
||||
}
|
||||
},
|
||||
|
||||
isSelected(hour) {
|
||||
return this.selectedHours.includes(hour);
|
||||
},
|
||||
|
||||
applyWeight() {
|
||||
this.selectedHours.forEach((hour) => {
|
||||
this.schedule[hour] = parseFloat(this.newWeight).toFixed(1);
|
||||
});
|
||||
},
|
||||
|
||||
getTotalWeight() {
|
||||
return Object.values(this.schedule).reduce(
|
||||
(sum, w) => sum + parseFloat(w),
|
||||
0
|
||||
);
|
||||
},
|
||||
|
||||
getPapersPerHour(hour) {
|
||||
const total = this.getTotalWeight();
|
||||
if (total === 0) return 0;
|
||||
return (
|
||||
(parseFloat(this.schedule[hour]) / total) *
|
||||
this.volume
|
||||
).toFixed(1);
|
||||
},
|
||||
|
||||
updateVolume() {
|
||||
fetch('/scraper/update_config', {
|
||||
method: 'POST',
|
||||
headers: {
|
||||
'Content-Type': 'application/json'
|
||||
},
|
||||
body: JSON.stringify({ volume: parseFloat(this.volume) })
|
||||
})
|
||||
.then(response => response.json())
|
||||
.then(data => {
|
||||
if (data.success) {
|
||||
showNotification('Volume updated successfully', 'success');
|
||||
// Update the volume in the dashboard tab too
|
||||
document.getElementById('volumeInput').value = this.volume;
|
||||
} else {
|
||||
showNotification(data.message, 'danger');
|
||||
}
|
||||
});
|
||||
},
|
||||
|
||||
updateSchedule() {
|
||||
fetch('/scraper/update_config', {
|
||||
method: 'POST',
|
||||
headers: {
|
||||
'Content-Type': 'application/json'
|
||||
},
|
||||
body: JSON.stringify({ schedule: this.schedule })
|
||||
})
|
||||
.then(response => response.json())
|
||||
.then(data => {
|
||||
if (data.success) {
|
||||
showNotification('Schedule updated successfully', 'success');
|
||||
this.selectedHours = []; // Clear selections after update
|
||||
} else {
|
||||
showNotification(data.message, 'danger');
|
||||
}
|
||||
});
|
||||
}
|
||||
};
|
||||
}
|
||||
|
||||
// Global variables for the scraper dashboard
|
||||
let notificationsEnabled = true;
|
||||
let activityChart = null;
|
||||
let currentTimeRange = 24;
|
||||
|
||||
// DOM elements
|
||||
const statusIndicator = document.getElementById('statusIndicator');
|
||||
const statusText = document.getElementById('statusText');
|
||||
const startButton = document.getElementById('startButton');
|
||||
const pauseButton = document.getElementById('pauseButton');
|
||||
const stopButton = document.getElementById('stopButton');
|
||||
const notificationsToggle = document.getElementById('notificationsToggle');
|
||||
const activityLog = document.getElementById('activityLog');
|
||||
|
||||
// Initialize the page
|
||||
document.addEventListener('DOMContentLoaded', function () {
|
||||
initStatusPolling();
|
||||
loadActivityStats(currentTimeRange);
|
||||
loadRecentActivity();
|
||||
|
||||
// Initialize event listeners
|
||||
startButton.addEventListener('click', startScraper);
|
||||
pauseButton.addEventListener('click', togglePauseScraper);
|
||||
stopButton.addEventListener('click', stopScraper);
|
||||
notificationsToggle.addEventListener('click', toggleNotifications);
|
||||
|
||||
document.getElementById('volumeForm').addEventListener('submit', function (e) {
|
||||
e.preventDefault();
|
||||
updateVolume();
|
||||
});
|
||||
|
||||
document.querySelectorAll('.time-range-btn').forEach(btn => {
|
||||
btn.addEventListener('click', function () {
|
||||
document.querySelectorAll('.time-range-btn').forEach(b => b.classList.remove('active'));
|
||||
this.classList.add('active');
|
||||
currentTimeRange = parseInt(this.dataset.hours);
|
||||
loadActivityStats(currentTimeRange);
|
||||
});
|
||||
});
|
||||
});
|
||||
|
||||
// Status polling
|
||||
function initStatusPolling() {
|
||||
updateStatus();
|
||||
setInterval(updateStatus, 5000); // Poll every 5 seconds
|
||||
}
|
||||
|
||||
function updateStatus() {
|
||||
fetch('/scraper/status')
|
||||
.then(response => response.json())
|
||||
.then(data => {
|
||||
if (data.active) {
|
||||
if (data.paused) {
|
||||
statusIndicator.className = 'status-indicator status-paused';
|
||||
statusText.textContent = 'Paused';
|
||||
pauseButton.textContent = 'Resume';
|
||||
} else {
|
||||
statusIndicator.className = 'status-indicator status-active';
|
||||
statusText.textContent = 'Active';
|
||||
pauseButton.textContent = 'Pause';
|
||||
}
|
||||
startButton.disabled = true;
|
||||
pauseButton.disabled = false;
|
||||
stopButton.disabled = false;
|
||||
} else {
|
||||
statusIndicator.className = 'status-indicator status-inactive';
|
||||
statusText.textContent = 'Inactive';
|
||||
startButton.disabled = false;
|
||||
pauseButton.disabled = true;
|
||||
stopButton.disabled = true;
|
||||
}
|
||||
});
|
||||
}
|
||||
|
||||
// Action functions
|
||||
function startScraper() {
|
||||
fetch('/scraper/start', { method: 'POST' })
|
||||
.then(response => response.json())
|
||||
.then(data => {
|
||||
if (data.success) {
|
||||
showNotification('Scraper started successfully', 'success');
|
||||
updateStatus();
|
||||
setTimeout(() => { loadRecentActivity(); }, 1000);
|
||||
} else {
|
||||
showNotification(data.message, 'danger');
|
||||
}
|
||||
});
|
||||
}
|
||||
|
||||
function togglePauseScraper() {
|
||||
fetch('/scraper/pause', { method: 'POST' })
|
||||
.then(response => response.json())
|
||||
.then(data => {
|
||||
if (data.success) {
|
||||
showNotification(data.message, 'info');
|
||||
updateStatus();
|
||||
setTimeout(() => { loadRecentActivity(); }, 1000);
|
||||
} else {
|
||||
showNotification(data.message, 'danger');
|
||||
}
|
||||
});
|
||||
}
|
||||
|
||||
function stopScraper() {
|
||||
fetch('/scraper/stop', { method: 'POST' })
|
||||
.then(response => response.json())
|
||||
.then(data => {
|
||||
if (data.success) {
|
||||
showNotification('Scraper stopped successfully', 'warning');
|
||||
updateStatus();
|
||||
setTimeout(() => { loadRecentActivity(); }, 1000);
|
||||
} else {
|
||||
showNotification(data.message, 'danger');
|
||||
}
|
||||
});
|
||||
}
|
||||
|
||||
function updateVolume() {
|
||||
const volume = document.getElementById('volumeInput').value;
|
||||
|
||||
fetch('/scraper/update_config', {
|
||||
method: 'POST',
|
||||
headers: {
|
||||
'Content-Type': 'application/json'
|
||||
},
|
||||
body: JSON.stringify({ volume: volume })
|
||||
})
|
||||
.then(response => response.json())
|
||||
.then(data => {
|
||||
if (data.success) {
|
||||
showNotification('Volume updated successfully', 'success');
|
||||
} else {
|
||||
showNotification(data.message, 'danger');
|
||||
}
|
||||
});
|
||||
}
|
||||
|
||||
function toggleNotifications() {
|
||||
notificationsEnabled = notificationsToggle.checked;
|
||||
}
|
||||
|
||||
// Load data functions
|
||||
function loadActivityStats(hours) {
|
||||
fetch(`/scraper/stats?hours=${hours}`)
|
||||
.then(response => response.json())
|
||||
.then(data => {
|
||||
renderActivityChart(data);
|
||||
});
|
||||
}
|
||||
|
||||
function loadRecentActivity() {
|
||||
fetch('/api/activity_logs?category=scraper_activity&limit=20')
|
||||
.then(response => response.json())
|
||||
.then(data => {
|
||||
renderActivityLog(data);
|
||||
})
|
||||
.catch(() => {
|
||||
// If the API endpoint doesn't exist, just show a message
|
||||
activityLog.innerHTML = '<tr><td colspan="4" class="text-center">Activity log API not available</td></tr>';
|
||||
});
|
||||
}
|
||||
|
||||
// Rendering functions
|
||||
function renderActivityChart(data) {
|
||||
const ctx = document.getElementById('activityChart').getContext('2d');
|
||||
|
||||
// Extract the data for the chart
|
||||
const labels = data.map(item => `${item.hour}:00`);
|
||||
const successData = data.map(item => item.success);
|
||||
const errorData = data.map(item => item.error);
|
||||
const pendingData = data.map(item => item.pending);
|
||||
|
||||
if (activityChart) {
|
||||
activityChart.destroy();
|
||||
}
|
||||
|
||||
activityChart = new Chart(ctx, {
|
||||
type: 'bar',
|
||||
data: {
|
||||
labels: labels,
|
||||
datasets: [
|
||||
{
|
||||
label: 'Success',
|
||||
data: successData,
|
||||
backgroundColor: '#28a745',
|
||||
stack: 'Stack 0'
|
||||
},
|
||||
{
|
||||
label: 'Error',
|
||||
data: errorData,
|
||||
backgroundColor: '#dc3545',
|
||||
stack: 'Stack 0'
|
||||
},
|
||||
{
|
||||
label: 'Pending',
|
||||
data: pendingData,
|
||||
backgroundColor: '#ffc107',
|
||||
stack: 'Stack 0'
|
||||
}
|
||||
]
|
||||
},
|
||||
options: {
|
||||
responsive: true,
|
||||
maintainAspectRatio: false,
|
||||
scales: {
|
||||
x: {
|
||||
stacked: true,
|
||||
title: {
|
||||
display: true,
|
||||
text: 'Hour'
|
||||
}
|
||||
},
|
||||
y: {
|
||||
stacked: true,
|
||||
beginAtZero: true,
|
||||
title: {
|
||||
display: true,
|
||||
text: 'Papers Scraped'
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
});
|
||||
}
|
||||
|
||||
function renderActivityLog(logs) {
|
||||
activityLog.innerHTML = '';
|
||||
|
||||
if (!logs || logs.length === 0) {
|
||||
activityLog.innerHTML = '<tr><td colspan="4" class="text-center">No recent activity</td></tr>';
|
||||
return;
|
||||
}
|
||||
|
||||
logs.forEach(log => {
|
||||
const row = document.createElement('tr');
|
||||
|
||||
// Format timestamp
|
||||
const date = new Date(log.timestamp);
|
||||
const timeStr = date.toLocaleTimeString();
|
||||
|
||||
// Create status badge
|
||||
let statusBadge = '';
|
||||
if (log.status === 'success') {
|
||||
statusBadge = '<span class="badge bg-success">Success</span>';
|
||||
} else if (log.status === 'error') {
|
||||
statusBadge = '<span class="badge bg-danger">Error</span>';
|
||||
} else if (log.status === 'pending') {
|
||||
statusBadge = '<span class="badge bg-warning text-dark">Pending</span>';
|
||||
} else {
|
||||
statusBadge = `<span class="badge bg-secondary">${log.status || 'Unknown'}</span>`;
|
||||
}
|
||||
|
||||
row.innerHTML = `
|
||||
<td>${timeStr}</td>
|
||||
<td>${log.action}</td>
|
||||
<td>${statusBadge}</td>
|
||||
<td>${log.description || ''}</td>
|
||||
`;
|
||||
|
||||
activityLog.appendChild(row);
|
||||
});
|
||||
}
|
||||
|
||||
// Notification functions
|
||||
function showNotification(message, type) {
|
||||
if (!notificationsEnabled && type !== 'danger') {
|
||||
return;
|
||||
}
|
||||
|
||||
const container = document.getElementById('notificationContainer');
|
||||
const notification = document.createElement('div');
|
||||
notification.className = `alert alert-${type} notification shadow-sm`;
|
||||
notification.innerHTML = `
|
||||
${message}
|
||||
<button type="button" class="btn-close float-end" aria-label="Close"></button>
|
||||
`;
|
||||
|
||||
container.appendChild(notification);
|
||||
|
||||
// Add close handler
|
||||
notification.querySelector('.btn-close').addEventListener('click', () => {
|
||||
notification.remove();
|
||||
});
|
||||
|
||||
// Auto-close after 5 seconds
|
||||
setTimeout(() => {
|
||||
notification.classList.add('fade');
|
||||
setTimeout(() => {
|
||||
notification.remove();
|
||||
}, 500);
|
||||
}, 5000);
|
||||
}
|
||||
|
||||
// WebSocket for real-time notifications
|
||||
function setupWebSocket() {
|
||||
// If WebSocket is available, implement it here
|
||||
// For now we'll poll the server periodically for new papers
|
||||
setInterval(checkForNewPapers, 10000); // Check every 10 seconds
|
||||
}
|
||||
|
||||
let lastPaperTimestamp = new Date().toISOString();
|
||||
|
||||
function checkForNewPapers() {
|
||||
fetch(`/api/activity_logs?category=scraper_activity&action=scrape_paper&after=${lastPaperTimestamp}&limit=5`)
|
||||
.then(response => response.json())
|
||||
.then(data => {
|
||||
if (data && data.length > 0) {
|
||||
// Update the timestamp
|
||||
lastPaperTimestamp = new Date().toISOString();
|
||||
|
||||
// Show notifications for new papers
|
||||
data.forEach(log => {
|
||||
const extraData = log.extra_data ? JSON.parse(log.extra_data) : {};
|
||||
if (log.status === 'success') {
|
||||
showNotification(`New paper scraped: ${extraData.title || 'Unknown title'}`, 'success');
|
||||
} else if (log.status === 'error') {
|
||||
showNotification(`Failed to scrape paper: ${log.description}`, 'danger');
|
||||
}
|
||||
});
|
||||
|
||||
// Refresh the activity chart and log
|
||||
loadActivityStats(currentTimeRange);
|
||||
loadRecentActivity();
|
||||
}
|
||||
})
|
||||
.catch(() => {
|
||||
// If the API endpoint doesn't exist, do nothing
|
||||
});
|
||||
}
|
||||
|
||||
// Start checking for new papers
|
||||
setupWebSocket();
|
||||
</script>
|
||||
{% endblock scripts %}
|
Loading…
x
Reference in New Issue
Block a user