Compare commits

...

12 Commits

35 changed files with 3450 additions and 896 deletions

6
.gitignore vendored
View File

@ -9,3 +9,9 @@ dist/
*.egg-info/
.pytest_cache/
.mypy_cache/
*.db
*.R
migrations/

65
DEVELOPMENT Normal file
View File

@ -0,0 +1,65 @@
## How to use the logger
### GUI Interactions:
```python
ActivityLog.log_gui_interaction(
action="view_paper_details",
description="User viewed paper details",
paper_id=123
)
```
### Configuration Changes:
```python
ActivityLog.log_gui_interaction(
action="view_paper_details",
description="User viewed paper details",
paper_id=123
)
```
### Scraper Commands:
```python
ActivityLog.log_scraper_command(
action="start_scraper",
status="running"
)
```
### Scraper Activities:
```python
ActivityLog.log_scraper_activity(
action="download_paper",
paper_id=123,
status="success",
description="Paper downloaded successfully",
file_path="/papers/123.pdf"
)
```
### Error Logging:
```python
# Simple error
ActivityLog.log_error(
error_message="Failed to connect to API",
severity=ErrorSeverity.WARNING.value,
source="api_client"
)
# Logging an exception
try:
result = some_risky_operation()
except Exception as e:
ActivityLog.log_error(
error_message="Operation failed",
exception=e,
severity=ErrorSeverity.ERROR.value,
source="data_processor",
paper_id=paper_id
)
```

View File

@ -1,10 +1,13 @@
# List of phony targets (targets that don't represent files)
.PHONY: all clean venv run format format-check lint mypy test dist reformat dev
.PHONY: all clean venv run format format-check lint mypy test dist reformat dev celery celery-flower redis run-all
# Define Python and pip executables inside virtual environment
PYTHON := venv/bin/python
PIP := venv/bin/pip
# Celery worker command
CELERY := venv/bin/celery
# Default target that runs the application
all: run
@ -83,11 +86,11 @@ todos:
@grep -r "TODO\|FIXME" scipaperloader || echo "No TODOs found"
# Reset the database: delete, initialize, and migrate
reset-db:
reset-db: venv
rm -f $(DB_PATH)
flask db init || true
flask db migrate -m "Initial migration"
flask db upgrade
$(PYTHON) -m flask --app scipaperloader db init || true
$(PYTHON) -m flask --app scipaperloader db migrate -m "Initial migration"
$(PYTHON) -m flask --app scipaperloader db upgrade
# Create and set up virtual environment
venv:
@ -130,3 +133,21 @@ dist: format-check lint mypy test
# Set up complete development environment
dev: clean venv
# Start Celery worker for processing tasks
celery: venv
$(CELERY) -A celery_worker:celery worker --loglevel=info
# Monitor Celery tasks with flower web interface
celery-flower: venv
$(PIP) install flower
$(CELERY) -A celery_worker:celery flower --port=5555
# Check if Redis is running, start if needed
redis:
@redis-cli ping > /dev/null 2>&1 || (echo "Starting Redis server..." && redis-server --daemonize yes)
# Run complete application stack (Flask app + Celery worker + Redis)
run-all: redis
@echo "Starting Flask and Celery..."
@$(MAKE) -j2 run celery

View File

@ -14,7 +14,8 @@ And open it in the browser at [http://localhost:5000/](http://localhost:5000/)
## Prerequisites
Python >=3.8
- Python >=3.8
- Redis (for Celery task queue)
## Development environment
@ -40,12 +41,44 @@ Python >=3.8
add development dependencies under `project.optional-dependencies.*`; run
`make clean && make venv` to reinstall the environment
## Asynchronous Task Processing with Celery
SciPaperLoader uses Celery for processing large CSV uploads and other background tasks. This allows the application to handle large datasets reliably without blocking the web interface.
### Running Celery Components
- `make redis`: ensures Redis server is running (required for Celery)
- `make celery`: starts a Celery worker to process background tasks
- `make celery-flower`: starts Flower, a web interface for monitoring Celery tasks at http://localhost:5555
- `make run-all`: runs the entire stack (Flask app + Celery worker + Redis) in development mode
### How It Works
When you upload a CSV file through the web interface:
1. The file is sent to the server
2. A Celery task is created to process the file asynchronously
3. The browser shows a progress bar with real-time updates
4. The results are displayed when processing is complete
This architecture allows SciPaperLoader to handle CSV files with thousands of papers without timing out or blocking the web interface.
## Configuration
Default configuration is loaded from `scipaperloader.defaults` and can be
overriden by environment variables with a `FLASK_` prefix. See
[Configuring from Environment Variables](https://flask.palletsprojects.com/en/3.0.x/config/#configuring-from-environment-variables).
### Celery Configuration
The following environment variables can be set to configure Celery:
- `FLASK_CELERY_BROKER_URL`: Redis URL for the message broker (default: `redis://localhost:6379/0`)
- `FLASK_CELERY_RESULT_BACKEND`: Redis URL for storing task results (default: `redis://localhost:6379/0`)
Consider using
[dotenv](https://flask.palletsprojects.com/en/3.0.x/cli/#environment-variables-from-dotenv).
@ -58,4 +91,12 @@ deliver to your server, or copy in your `Dockerfile`, and insall it with `pip`.
You must set a
[SECRET_KEY](https://flask.palletsprojects.com/en/3.0.x/tutorial/deploy/#configure-the-secret-key)
in production to a secret and stable value.
in production to a secret and stable value.
### Deploying with Celery
When deploying to production:
1. Configure a production-ready Redis instance or use a managed service
2. Run Celery workers as system services or in Docker containers
3. Consider setting up monitoring for your Celery tasks and workers

7
celery_worker.py Normal file
View File

@ -0,0 +1,7 @@
from scipaperloader.celery import celery, configure_celery
# Configure celery with Flask app
configure_celery()
if __name__ == '__main__':
celery.start()

BIN
dump.rdb Normal file

Binary file not shown.

View File

@ -13,6 +13,10 @@ dependencies = [
"flask-wtf>=1.2.2,<2",
"pyzotero>=1.6.11,<2",
"pandas>=2.2.3,<3",
"celery>=5.5.1,<6",
"redis>=5.2.1,<6",
"flower>=2.0.1,<3",
"flask-migrate>=4.1.0,<5",
]
[project.optional-dependencies]

View File

@ -1,18 +1,24 @@
from flask import Flask
from flask import Flask, request
from flask_migrate import Migrate # Add this line
from .config import Config
from .db import db
from .models import init_schedule_config
from .models import ActivityLog, ActivityCategory
from .blueprints import register_blueprints
def create_app(test_config=None):
app = Flask(__name__)
app.config.from_object(Config)
# Celery configuration
app.config['CELERY_BROKER_URL'] = app.config.get('CELERY_BROKER_URL', 'redis://localhost:6379/0')
app.config['CELERY_RESULT_BACKEND'] = app.config.get('CELERY_RESULT_BACKEND', 'redis://localhost:6379/0')
if test_config:
app.config.update(test_config)
db.init_app(app)
migrate = Migrate(app, db) # Add this line to initialize Flask-Migrate
with app.app_context():
db.create_all()
@ -22,8 +28,23 @@ def create_app(test_config=None):
def inject_app_title():
return {"app_title": app.config["APP_TITLE"]}
from . import views
register_blueprints(app)
app.register_blueprint(views.bp)
return app
@app.before_request
def before_request():
# Skip logging for static files, health checks, or other frequent requests
if request.path.startswith('/static/') or request.path == '/health' or request.path == '/favicon.ico':
return
# Skip task status checks to avoid log spam
if request.path.startswith('/task_status/'):
return
action = request.endpoint or request.path or "unknown_request"
ActivityLog.log_gui_interaction(
action=action,
description=f"Request to {request.path}",
extra={"method": request.method, "url": request.url}
)
return app

View File

@ -0,0 +1,17 @@
"""Blueprint registration module."""
from flask import Flask
from .main import bp as main_bp
from .papers import bp as papers_bp
from .upload import bp as upload_bp
from .schedule import bp as schedule_bp
from .logger import bp as logger_bp
def register_blueprints(app: Flask):
"""Register all blueprints with the Flask application."""
app.register_blueprint(main_bp)
app.register_blueprint(papers_bp, url_prefix='/papers')
app.register_blueprint(upload_bp, url_prefix='/upload')
app.register_blueprint(schedule_bp, url_prefix='/schedule')
app.register_blueprint(logger_bp, url_prefix='/logs')

View File

@ -0,0 +1,112 @@
"""Logger view."""
import csv
import io
import datetime
from flask import Blueprint, render_template, request, send_file
from ..db import db
from ..models import ActivityLog, ActivityCategory
bp = Blueprint("logger", __name__, url_prefix="/logs")
@bp.route("/")
def list_logs():
page = request.args.get("page", 1, type=int)
per_page = 50
# Filters
category = request.args.get("category")
start_date = request.args.get("start_date")
end_date = request.args.get("end_date")
search_term = request.args.get("search_term")
if search_term == "None":
search_term = None
query = ActivityLog.query
if category:
query = query.filter(ActivityLog.category == category)
if start_date:
start_date_dt = datetime.datetime.strptime(start_date, "%Y-%m-%d")
query = query.filter(ActivityLog.timestamp >= start_date_dt)
if end_date:
end_date_dt = datetime.datetime.strptime(end_date, "%Y-%m-%d") + datetime.timedelta(days=1)
query = query.filter(ActivityLog.timestamp <= end_date_dt)
if search_term:
query = query.filter(db.or_(
ActivityLog.action.contains(search_term),
ActivityLog.description.contains(search_term)
))
pagination = query.order_by(ActivityLog.timestamp.desc()).paginate(page=page, per_page=per_page, error_out=False)
categories = [e.value for e in ActivityCategory]
return render_template(
"logger.html.jinja",
logs=pagination.items,
pagination=pagination,
categories=categories,
category=category,
start_date=start_date,
end_date=end_date,
search_term=search_term,
app_title="PaperScraper",
)
@bp.route("/download")
def download_logs():
# Filters - reuse logic from list_logs
category = request.args.get("category")
start_date = request.args.get("start_date")
end_date = request.args.get("end_date")
search_term = request.args.get("search_term")
query = ActivityLog.query
if category:
query = query.filter(ActivityLog.category == category)
if start_date:
start_date_dt = datetime.datetime.strptime(start_date, "%Y-%m-%d")
query = query.filter(ActivityLog.timestamp >= start_date_dt)
if end_date:
end_date_dt = datetime.datetime.strptime(end_date, "%Y-%m-%d") + datetime.timedelta(days=1)
query = query.filter(ActivityLog.timestamp <= end_date_dt)
if search_term:
query = query.filter(db.or_(
ActivityLog.action.contains(search_term),
ActivityLog.description.contains(search_term)
))
logs = query.order_by(ActivityLog.timestamp.desc()).all()
# Prepare CSV data
csv_data = io.StringIO()
csv_writer = csv.writer(csv_data)
csv_writer.writerow(["Timestamp", "Category", "Action", "Description", "Extra Data"]) # Header
for log in logs:
csv_writer.writerow([
log.timestamp,
log.category,
log.action,
log.description,
log.extra_data # Consider formatting this better
])
# Create response
filename = f"logs_{datetime.datetime.now().strftime('%Y%m%d_%H%M%S')}.csv"
return send_file(
io.StringIO(csv_data.getvalue()),
mimetype="text/csv",
as_attachment=True,
download_name=filename
)
@bp.route("/<int:log_id>/detail")
def log_detail(log_id):
log = ActivityLog.query.get_or_404(log_id)
return render_template("partials/log_detail_modal.html.jinja", log=log)

View File

@ -0,0 +1,19 @@
"""Main routes for the application."""
from flask import Blueprint, render_template
bp = Blueprint("main", __name__)
@bp.route("/")
def index():
return render_template("index.html.jinja")
@bp.route("/logs")
def logs():
return render_template("logs.html.jinja", app_title="PaperScraper")
@bp.route("/about")
def about():
return render_template("about.html.jinja", app_title="PaperScraper")

View File

@ -0,0 +1,140 @@
"""Paper management routes."""
import csv
import datetime
import io
from flask import (
Blueprint,
render_template,
request,
send_file,
)
from sqlalchemy import asc, desc
from ..db import db
from ..models import PaperMetadata
bp = Blueprint("papers", __name__)
@bp.route("/")
def list_papers():
page = request.args.get("page", 1, type=int)
per_page = 50
# Filters
status = request.args.get("status")
created_from = request.args.get("created_from")
created_to = request.args.get("created_to")
updated_from = request.args.get("updated_from")
updated_to = request.args.get("updated_to")
sort_by = request.args.get("sort_by", "created_at")
sort_dir = request.args.get("sort_dir", "desc")
query = PaperMetadata.query
# Apply filters
if status:
query = query.filter(PaperMetadata.status == status)
def parse_date(val):
from datetime import datetime
try:
return datetime.strptime(val, "%Y-%m-%d")
except (ValueError, TypeError):
return None
if created_from := parse_date(created_from):
query = query.filter(PaperMetadata.created_at >= created_from)
if created_to := parse_date(created_to):
query = query.filter(PaperMetadata.created_at <= created_to)
if updated_from := parse_date(updated_from):
query = query.filter(PaperMetadata.updated_at >= updated_from)
if updated_to := parse_date(updated_to):
query = query.filter(PaperMetadata.updated_at <= updated_to)
# Sorting
sort_col = getattr(PaperMetadata, sort_by, PaperMetadata.created_at)
sort_func = desc if sort_dir == "desc" else asc
query = query.order_by(sort_func(sort_col))
# Pagination
pagination = query.paginate(page=page, per_page=per_page, error_out=False)
# Statistics
total_papers = PaperMetadata.query.count()
status_counts = (
db.session.query(PaperMetadata.status, db.func.count(PaperMetadata.status))
.group_by(PaperMetadata.status)
.all()
)
status_counts = {status: count for status, count in status_counts}
return render_template(
"papers.html.jinja",
papers=pagination.items,
pagination=pagination,
total_papers=total_papers,
status_counts=status_counts,
sort_by=sort_by,
sort_dir=sort_dir,
)
@bp.route("/export")
def export_papers():
# Filters
status = request.args.get("status")
created_from = request.args.get("created_from")
created_to = request.args.get("created_to")
updated_from = request.args.get("updated_from")
updated_to = request.args.get("updated_to")
sort_by = request.args.get("sort_by", "created_at")
sort_dir = request.args.get("sort_dir", "desc")
query = PaperMetadata.query
# Apply filters
if status:
query = query.filter(PaperMetadata.status == status)
def parse_date(val):
try:
return datetime.datetime.strptime(val, "%Y-%m-%d")
except Exception:
return None
output = io.StringIO()
writer = csv.writer(output)
writer.writerow(
["ID", "Title", "Journal", "DOI", "ISSN", "Status", "Created At", "Updated At"]
)
for paper in query:
writer.writerow(
[
paper.id,
paper.title,
paper.journal,
paper.doi,
paper.issn,
paper.status,
paper.created_at,
paper.updated_at,
]
)
output.seek(0)
return send_file(
io.BytesIO(output.read().encode("utf-8")),
mimetype="text/csv",
as_attachment=True,
download_name="papers.csv",
)
@bp.route("/<int:paper_id>/detail")
def paper_detail(paper_id):
paper = PaperMetadata.query.get_or_404(paper_id)
return render_template("partials/paper_detail_modal.html.jinja", paper=paper)

View File

@ -0,0 +1,79 @@
"""Schedule configuration routes."""
from flask import Blueprint, flash, render_template, request
from ..db import db
from ..models import ScheduleConfig, VolumeConfig
bp = Blueprint("schedule", __name__)
@bp.route("/", methods=["GET", "POST"])
def schedule():
if request.method == "POST":
try:
# Check if we're updating volume or schedule
if "total_volume" in request.form:
# Volume update
try:
new_volume = float(request.form.get("total_volume", 0))
if new_volume <= 0 or new_volume > 1000:
raise ValueError("Volume must be between 1 and 1000")
volume_config = VolumeConfig.query.first()
if not volume_config:
volume_config = VolumeConfig(volume=new_volume)
db.session.add(volume_config)
else:
volume_config.volume = new_volume
db.session.commit()
flash("Volume updated successfully!", "success")
except ValueError as e:
db.session.rollback()
flash(f"Error updating volume: {str(e)}", "error")
else:
# Schedule update logic
# Validate form data
for hour in range(24):
key = f"hour_{hour}"
if key not in request.form:
raise ValueError(f"Missing data for hour {hour}")
try:
weight = float(request.form.get(key, 0))
if weight < 0 or weight > 5:
raise ValueError(
f"Weight for hour {hour} must be between 0 and 5"
)
except ValueError:
raise ValueError(f"Invalid weight value for hour {hour}")
# Update database if validation passes
for hour in range(24):
key = f"hour_{hour}"
weight = float(request.form.get(key, 0))
config = ScheduleConfig.query.get(hour)
if config:
config.weight = weight
else:
db.session.add(ScheduleConfig(hour=hour, weight=weight))
db.session.commit()
flash("Schedule updated successfully!", "success")
except ValueError as e:
db.session.rollback()
flash(f"Error updating schedule: {str(e)}", "error")
schedule = {
sc.hour: sc.weight
for sc in ScheduleConfig.query.order_by(ScheduleConfig.hour).all()
}
volume = VolumeConfig.query.first()
return render_template(
"schedule.html.jinja",
schedule=schedule,
volume=volume.volume if volume else 0,
app_title="PaperScraper",
)

View File

@ -0,0 +1,261 @@
"""Upload functionality for paper metadata."""
import codecs
import csv
import datetime
from io import StringIO
import json
import pandas as pd
from flask import (
Blueprint,
flash,
jsonify,
redirect,
render_template,
request,
send_file,
session,
url_for,
current_app
)
from ..db import db
from ..models import PaperMetadata, ActivityLog
from ..celery import celery # Import the celery instance directly
bp = Blueprint("upload", __name__)
REQUIRED_COLUMNS = {"alternative_id", "journal", "doi", "issn", "title"}
CHUNK_SIZE = 100 # Number of rows to process per batch
def parse_date(date_str):
"""Parse date string into datetime object."""
if not date_str or pd.isna(date_str):
return None
try:
return datetime.datetime.strptime(date_str, "%Y-%m-%d")
except ValueError:
return None
@bp.route("/", methods=["GET", "POST"])
def upload():
if request.method == "POST":
file = request.files.get("file")
delimiter = request.form.get("delimiter", ",")
duplicate_strategy = request.form.get("duplicate_strategy", "skip")
if not file:
return jsonify({"error": "No file selected."})
stream = codecs.iterdecode(file.stream, "utf-8")
content = "".join(stream)
# Trigger the Celery task
task = process_csv.delay(content, delimiter, duplicate_strategy)
return jsonify({"task_id": task.id})
return render_template("upload.html.jinja")
@celery.task(bind=True)
def process_csv(self, file_content, delimiter, duplicate_strategy):
"""Process CSV file and import paper metadata."""
# With the ContextTask in place, we're already inside an app context
added_count = skipped_count = updated_count = error_count = 0
errors = []
skipped_records = [] # Add this to track skipped records
try:
# Log the start of import using ActivityLog model
ActivityLog.log_import_activity(
action="start_csv_import",
status="processing",
description=f"Starting CSV import with strategy: {duplicate_strategy}",
file_size=len(file_content),
delimiter=delimiter
)
# Set initial progress percentage
self.update_state(state='PROGRESS', meta={'progress': 10})
# Read CSV into chunks
csv_buffer = StringIO(file_content)
# Count total chunks
csv_buffer.seek(0)
total_chunks = len(list(pd.read_csv(csv_buffer, delimiter=delimiter, chunksize=CHUNK_SIZE)))
csv_buffer.seek(0)
# Process each chunk of rows
for chunk_idx, chunk in enumerate(pd.read_csv(csv_buffer, delimiter=delimiter, chunksize=CHUNK_SIZE)):
for index, row in chunk.iterrows():
try:
doi = str(row.get("doi", "N/A"))
# Validate required fields
if pd.isna(row.get("title")) or pd.isna(row.get("doi")) or pd.isna(row.get("issn")):
raise ValueError("Missing required fields")
# Try finding an existing record based on DOI
existing = db.session.query(PaperMetadata).filter_by(doi=doi).first()
if existing:
if duplicate_strategy == "update":
existing.title = row["title"]
existing.alt_id = row.get("alternative_id")
existing.issn = row["issn"]
existing.journal = row.get("journal")
existing.published_online = parse_date(row.get("published_online"))
updated_count += 1
else:
# Track why this record was skipped
skipped_records.append({
"row": index + 2,
"doi": doi,
"reason": f"Duplicate DOI found and strategy is '{duplicate_strategy}'"
})
skipped_count += 1
continue
else:
metadata = PaperMetadata(
title=row["title"],
doi=doi,
alt_id=row.get("alternative_id"),
issn=row["issn"],
journal=row.get("journal"),
published_online=parse_date(row.get("published_online")),
status="New",
)
db.session.add(metadata)
added_count += 1
except Exception as e:
error_count += 1
errors.append({"row": index + 2, "doi": row.get("doi", "N/A"), "error": str(e)})
# Commit the chunk and roll session fresh
db.session.commit()
# Log periodic progress every 5 chunks
if (chunk_idx + 1) % 5 == 0:
ActivityLog.log_import_activity(
action="import_progress",
status="processing",
description=f"Processed {chunk_idx+1}/{total_chunks} chunks",
current_stats={
"added": added_count,
"updated": updated_count,
"skipped": skipped_count,
"errors": error_count
}
)
progress = min(90, 10 + int((chunk_idx + 1) * 80 / total_chunks))
self.update_state(state='PROGRESS', meta={'progress': progress})
# Final progress update and completion log
self.update_state(state='PROGRESS', meta={'progress': 100})
ActivityLog.log_import_activity(
action="complete_csv_import",
status="success",
description="CSV import completed",
stats={
"added": added_count,
"updated": updated_count,
"skipped": skipped_count,
"errors": error_count
}
)
except Exception as e:
db.session.rollback()
ActivityLog.log_error(
error_message="CSV import failed",
exception=e,
severity="error",
source="upload.process_csv"
)
return {'error': str(e), 'progress': 0}
finally:
db.session.remove()
# If there were errors, store an error CSV for potential download
if errors:
try:
error_csv = StringIO()
writer = csv.DictWriter(error_csv, fieldnames=["row", "doi", "error"])
writer.writeheader()
writer.writerows(errors)
ActivityLog.log_import_activity(
action="import_errors",
status="error",
description=f"Import completed with {error_count} errors",
error_csv=error_csv.getvalue(),
task_id=self.request.id,
error_count=error_count
)
except Exception:
# Do not fail the task if error logging fails
pass
# Update the return value to include skipped records information
return {
"added": added_count,
"updated": updated_count,
"skipped": skipped_count,
"skipped_records": skipped_records[:5], # Include up to 5 examples
"skipped_reason_summary": "Records were skipped because they already exist in the database. Use 'update' strategy to update them.",
"errors": errors[:5],
"error_count": error_count,
"task_id": self.request.id
}
@bp.route("/task_status/<task_id>")
def task_status(task_id):
"""Get status of background task."""
task = celery.AsyncResult(task_id)
if task.state == "PENDING":
response = {"state": task.state, "progress": 0}
elif task.state == "PROGRESS":
response = {
"state": task.state,
"progress": task.info.get("progress", 0)
}
elif task.state == "SUCCESS":
response = {
"state": task.state,
"result": task.result
}
else: # FAILURE, REVOKED, etc.
response = {
"state": task.state,
"error": str(task.info) if task.info else "Unknown error"
}
return jsonify(response)
@bp.route("/download_error_log/<task_id>")
def download_error_log(task_id):
# Find the most recent error log for this task
error_log = ActivityLog.query.filter(
ActivityLog.action == "import_errors",
ActivityLog.extra_data.like(f'%"{task_id}"%') # Search in JSON
).order_by(ActivityLog.timestamp.desc()).first()
if not error_log:
flash("No error data available.")
return redirect(url_for("upload.upload"))
# Get the CSV data from extra_data
extra_data = error_log.get_extra_data()
error_csv = extra_data.get("error_csv")
if not error_csv:
flash("Error data format is invalid.")
return redirect(url_for("upload.upload"))
buffer = StringIO(error_csv)
return send_file(
buffer,
mimetype="text/csv",
as_attachment=True,
download_name=f"upload_errors_{datetime.datetime.now().strftime('%Y%m%d_%H%M%S')}.csv"
)

43
scipaperloader/celery.py Normal file
View File

@ -0,0 +1,43 @@
from celery import Celery
# Create Celery instance without Flask app initially
celery = Celery(
'scipaperloader',
broker='redis://localhost:6379/0',
backend='redis://localhost:6379/0',
)
def configure_celery(app=None):
"""Configure Celery with the Flask app settings and ensure tasks run in the app context."""
if app is None:
# Import here to avoid circular import
from scipaperloader import create_app
app = create_app()
# Update Celery configuration using the app settings
celery.conf.update(
broker_url=app.config.get('CELERY_BROKER_URL', 'redis://localhost:6379/0'),
result_backend=app.config.get('CELERY_RESULT_BACKEND', 'redis://localhost:6379/0'),
task_serializer='json',
accept_content=['json'],
result_serializer='json',
timezone='UTC',
enable_utc=True,
task_time_limit=3600, # 1 hour max runtime
task_soft_time_limit=3000, # 50 minutes soft limit
worker_max_tasks_per_child=10, # Restart workers after 10 tasks
worker_max_memory_per_child=1000000, # 1GB memory limit
task_acks_late=True, # Acknowledge tasks after completion
task_reject_on_worker_lost=True, # Requeue tasks if worker dies
)
# Create a custom task class that pushes the Flask application context
class ContextTask(celery.Task):
abstract = True
def __call__(self, *args, **kwargs):
with app.app_context():
return self.run(*args, **kwargs)
celery.Task = ContextTask
return celery

View File

@ -1,5 +1,184 @@
from .db import db
import json
from datetime import datetime
from enum import Enum
class ActivityCategory(Enum):
"""Categories for activity logs."""
GUI_INTERACTION = "gui_interaction"
CONFIG_CHANGE = "config_change"
SCRAPER_COMMAND = "scraper_command"
SCRAPER_ACTIVITY = "scraper_activity"
SYSTEM = "system"
DATA_IMPORT = "data_import"
class ErrorSeverity(Enum):
"""Severity levels for error logging."""
DEBUG = "debug"
INFO = "info"
WARNING = "warning"
ERROR = "error"
CRITICAL = "critical"
class ActivityLog(db.Model):
"""Model for logging various activities in the application."""
id = db.Column(db.Integer, primary_key=True)
timestamp = db.Column(db.DateTime, default=datetime.utcnow, index=True)
category = db.Column(db.String(50), nullable=False, index=True)
action = db.Column(db.String(100), nullable=False)
description = db.Column(db.Text)
# Reference to related entities (optional)
paper_id = db.Column(db.Integer, db.ForeignKey('paper_metadata.id'), nullable=True)
user_id = db.Column(db.Integer, nullable=True) # For future authentication
# For config changes
config_key = db.Column(db.String(100), nullable=True)
old_value = db.Column(db.Text, nullable=True)
new_value = db.Column(db.Text, nullable=True)
# For scraper activities
status = db.Column(db.String(50), nullable=True)
source_ip = db.Column(db.String(50), nullable=True)
# Extra data as JSON
extra_data = db.Column(db.Text, nullable=True)
def set_extra_data(self, data_dict):
"""Serialize extra data as JSON string."""
if data_dict:
self.extra_data = json.dumps(data_dict)
def get_extra_data(self):
"""Deserialize JSON string to dictionary."""
if self.extra_data:
return json.loads(self.extra_data)
return {}
@classmethod
def log_gui_interaction(cls, action, description=None, paper_id=None, user_id=None, **extra):
"""Log a GUI interaction."""
log = cls(
category=ActivityCategory.GUI_INTERACTION.value,
action=action,
description=description,
paper_id=paper_id,
user_id=user_id
)
log.set_extra_data(extra)
db.session.add(log)
db.session.commit()
return log
@classmethod
def log_config_change(cls, config_key, old_value, new_value, user_id=None, **extra):
"""Log a configuration change."""
log = cls(
category=ActivityCategory.CONFIG_CHANGE.value,
action=f"Changed {config_key}",
config_key=config_key,
old_value=str(old_value),
new_value=str(new_value),
user_id=user_id
)
log.set_extra_data(extra)
db.session.add(log)
db.session.commit()
return log
@classmethod
def log_scraper_command(cls, action, status=None, user_id=None, **extra):
"""Log a scraper command (start/stop/pause)."""
log = cls(
category=ActivityCategory.SCRAPER_COMMAND.value,
action=action,
status=status,
user_id=user_id
)
log.set_extra_data(extra)
db.session.add(log)
db.session.commit()
return log
@classmethod
def log_scraper_activity(cls, action, paper_id=None, status=None, description=None, **extra):
"""Log a scraper activity (downloading, processing papers, etc.)."""
log = cls(
category=ActivityCategory.SCRAPER_ACTIVITY.value,
action=action,
paper_id=paper_id,
status=status,
description=description
)
log.set_extra_data(extra)
db.session.add(log)
db.session.commit()
return log
@classmethod
def log_error(cls, error_message, exception=None, severity=ErrorSeverity.ERROR.value,
source=None, paper_id=None, user_id=None, **extra):
"""Log system errors or warnings.
Args:
error_message: Brief description of the error
exception: The exception object if available
severity: Error severity level (debug, info, warning, error, critical)
source: Component/module where the error occurred
paper_id: Related paper ID if applicable
user_id: Related user ID if applicable
**extra: Any additional data to store
"""
details = {}
if exception:
details.update({
'exception_type': type(exception).__name__,
'exception_message': str(exception)
})
# Get traceback if available
import traceback
details['traceback'] = traceback.format_exc()
if source:
extra['source'] = source
log = cls(
category=ActivityCategory.SYSTEM.value,
action=f"{severity.upper()}: {error_message}"[:100], # Limit action length
description=error_message,
paper_id=paper_id,
user_id=user_id,
status=severity
)
# Add exception details to extra data
extra.update(details)
log.set_extra_data(extra)
db.session.add(log)
db.session.commit()
return log
@classmethod
def log_import_activity(cls, action, status=None, description=None, user_id=None, **extra):
"""Log data import activities (CSV uploads, bulk imports, etc.)."""
log = cls(
category=ActivityCategory.DATA_IMPORT.value,
action=action,
status=status,
description=description,
user_id=user_id
)
log.set_extra_data(extra)
db.session.add(log)
db.session.commit()
return log
class PaperMetadata(db.Model):
id = db.Column(db.Integer, primary_key=True)
@ -7,6 +186,7 @@ class PaperMetadata(db.Model):
doi = db.Column(db.String, unique=True, index=True)
alt_id = db.Column(db.String)
issn = db.Column(db.String(32))
journal = db.Column(db.String(255))
type = db.Column(db.String(50))
language = db.Column(db.String(50))
published_online = db.Column(db.Date) # or DateTime/String
@ -19,7 +199,6 @@ class PaperMetadata(db.Model):
default=db.func.current_timestamp(),
onupdate=db.func.current_timestamp(),
)
# plus maybe timestamps for created/updated
class ScheduleConfig(db.Model):

View File

@ -1,5 +1,9 @@
.message {
padding: 10px;
font-size: 1.3em;
font-family: Arial, sans-serif;
padding: 10px;
font-size: 1.3em;
font-family: Arial, sans-serif;
}
.progress-bar {
width: 0%;
}

View File

@ -1,4 +1,4 @@
{% extends 'base.html' %} {% block content %}
{% extends "base.html.jinja" %} {% block content %}
<h1 class="mb-4">📘 About This App</h1>
<p class="lead">
@ -107,4 +107,4 @@
<li>Anyone needing a structured way to fetch and track papers in bulk</li>
</ul>
</section>
{% endblock %}
{% endblock content %}

View File

@ -1,22 +0,0 @@
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8" />
<title>{{ app_title }}</title>
<link
href="https://cdn.jsdelivr.net/npm/bootstrap@5.3.3/dist/css/bootstrap.min.css"
rel="stylesheet"
/>
<script src="https://cdn.jsdelivr.net/npm/bootstrap@5.3.3/dist/js/bootstrap.bundle.min.js"></script>
<!-- Optional Alpine.js -->
<script
defer
src="https://cdn.jsdelivr.net/npm/alpinejs@3.x.x/dist/cdn.min.js"
></script>
</head>
<body>
{% include 'nav.html' %}
<main class="container my-5">{% block content %}{% endblock %}</main>
{% include 'footer.html' %}
</body>
</html>

View File

@ -0,0 +1,21 @@
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8" />
<meta name="description" content="A platform to load scientific papers and manage metadata." />
<meta name="keywords" content="science, papers, research, management" />
<title>{{ app_title }}</title>
<link href="https://cdn.jsdelivr.net/npm/bootstrap@5.3.3/dist/css/bootstrap.min.css" rel="stylesheet" />
<script src="https://cdn.jsdelivr.net/npm/bootstrap@5.3.3/dist/js/bootstrap.bundle.min.js"></script>
<!-- Optional Alpine.js -->
<script defer src="https://cdn.jsdelivr.net/npm/alpinejs@3.x.x/dist/cdn.min.js"></script>
</head>
<body>
{% include "nav.html.jinja" %}
<main class="container my-5">{% block content %}{% endblock content %}</main>
{% include "footer.html.jinja" %}
</body>
</html>

View File

@ -1,4 +1,5 @@
{% extends 'base.html' %} {% block content %}
{% extends "base.html.jinja" %}
{% block content %}
<div class="container text-center">
<h1 class="display-4">Welcome to SciPaperLoader</h1>
@ -16,7 +17,7 @@
(title, DOI, ISSN, etc.) are stored. Errors are reported without
aborting the batch.
</p>
<a href="/import" class="btn btn-sm btn-outline-primary">Upload Now</a>
<a href="{{ url_for('upload.upload') }}" class="btn btn-sm btn-outline-primary">Upload Now</a>
</div>
</div>
</div>
@ -29,7 +30,7 @@
A daemon process runs hourly to fetch papers using Zotero API.
Downloads are randomized to mimic human behavior and avoid detection.
</p>
<a href="/logs" class="btn btn-sm btn-outline-secondary">View Logs</a>
<a href="{{ url_for('logger.list_logs') }}" class="btn btn-sm btn-outline-secondary">View Logs</a>
</div>
</div>
</div>
@ -43,9 +44,7 @@
inspect errors. Files are stored on disk in structured folders per
DOI.
</p>
<a href="/papers" class="btn btn-sm btn-outline-success"
>Browse Papers</a
>
<a href="{{ url_for('papers.list_papers') }}" class="btn btn-sm btn-outline-success">Browse Papers</a>
</div>
</div>
</div>
@ -59,11 +58,9 @@
volume (e.g. 2/hour at daytime, 0 at night) to match your bandwidth or
usage pattern.
</p>
<a href="/schedule" class="btn btn-sm btn-outline-warning"
>Adjust Schedule</a
>
<a href="{{ url_for('schedule.schedule') }}" class="btn btn-sm btn-outline-warning">Adjust Schedule</a>
</div>
</div>
</div>
</div>
{% endblock %}
{% endblock content %}

View File

@ -0,0 +1,117 @@
{% extends "base.html.jinja" %}
{% block content %}
<h1>Activity Logs</h1>
<form method="get" class="mb-3">
<div class="row g-2">
<div class="col-md-3">
<label for="category" class="form-label">Category:</label>
<select name="category" id="category" class="form-select">
<option value="">All</option>
{% for cat in categories %}
<option value="{{ cat }}" {% if category==cat %}selected{% endif %}>{{ cat }}</option>
{% endfor %}
</select>
</div>
<div class="col-md-3">
<label for="start_date" class="form-label">Start Date:</label>
<input type="date" name="start_date" id="start_date" value="{{ start_date }}" class="form-control">
</div>
<div class="col-md-3">
<label for="end_date" class="form-label">End Date:</label>
<input type="date" name="end_date" id="end_date" value="{{ end_date }}" class="form-control">
</div>
<div class="col-md-3">
<label for="search_term" class="form-label">Search:</label>
<input type="text" name="search_term" id="search_term" value="{{ search_term }}" class="form-control">
</div>
</div>
<div class="mt-3">
<button type="submit" class="btn btn-primary">Filter</button>
<a href="{{ url_for('logger.download_logs', category=category, start_date=start_date, end_date=end_date, search_term=search_term) }}"
class="btn btn-secondary">Download CSV</a>
</div>
</form>
<ul class="list-group">
{% for log in logs %}
<li class="list-group-item log-item" data-log-id="{{ log.id }}">
<div class="d-flex justify-content-between align-items-center">
<div class="ms-2 me-auto">
<div class="fw-bold">{{ log.timestamp }}</div>
{{ log.action }} - {{ log.description }}
</div>
<span class="badge bg-primary rounded-pill">{{ log.category }}</span>
</div>
</li>
{% endfor %}
</ul>
{% if pagination %}
<nav aria-label="Page navigation" class="mt-4">
<ul class="pagination justify-content-center">
{% if pagination.has_prev %}
<li class="page-item">
<a class="page-link"
href="{{ url_for('logger.list_logs', page=pagination.prev_num, category=category, start_date=start_date, end_date=end_date, search_term=search_term) }}">Previous</a>
</li>
{% else %}
<li class="page-item disabled">
<span class="page-link">Previous</span>
</li>
{% endif %}
<li class="page-item disabled">
<span class="page-link">Page {{ pagination.page }} of {{ pagination.pages }}</span>
</li>
{% if pagination.has_next %}
<li class="page-item">
<a class="page-link"
href="{{ url_for('logger.list_logs', page=pagination.next_num, category=category, start_date=start_date, end_date=end_date, search_term=search_term) }}">Next</a>
</li>
{% else %}
<li class="page-item disabled">
<span class="page-link">Next</span>
</li>
{% endif %}
</ul>
</nav>
{% endif %}
<!-- Modal for log details -->
<div class="modal fade" id="logDetailModal" tabindex="-1" aria-hidden="true">
<div class="modal-dialog modal-lg modal-dialog-scrollable">
<div class="modal-content" id="log-detail-content">
<!-- Log details will be loaded here via AJAX -->
</div>
</div>
</div>
<script>
document.addEventListener("DOMContentLoaded", function () {
const modal = new bootstrap.Modal(document.getElementById('logDetailModal'));
const content = document.getElementById('log-detail-content');
document.querySelectorAll('.log-item').forEach(item => {
item.addEventListener('click', function () {
const logId = this.getAttribute('data-log-id');
fetch(`/logs/${logId}/detail`)
.then(response => response.text())
.then(html => {
content.innerHTML = html;
modal.show();
})
.catch(err => {
content.innerHTML = '<div class="modal-body text-danger">Error loading log details.</div>';
modal.show();
});
});
});
});
</script>
{% endblock content %}

View File

@ -1,63 +0,0 @@
<nav class="navbar navbar-expand-lg navbar-light bg-light">
<div class="container-fluid">
<a class="navbar-brand" href="{{ url_for('main.index') }}"
>{{ app_title }}</a
>
<button
class="navbar-toggler"
type="button"
data-bs-toggle="collapse"
data-bs-target="#navbarSupportedContent"
aria-controls="navbarSupportedContent"
aria-expanded="false"
aria-label="Toggle navigation"
>
<span class="navbar-toggler-icon"></span>
</button>
<div class="collapse navbar-collapse" id="navbarSupportedContent">
<ul class="navbar-nav me-auto mb-2 mb-lg-0">
<li class="nav-item">
<a class="nav-link" href="/upload">Import CSV</a>
</li>
<li class="nav-item">
<a class="nav-link" href="/papers">Papers</a>
</li>
<li class="nav-item">
<a class="nav-link" href="/schedule">Schedule</a>
</li>
<li class="nav-item dropdown">
<a
class="nav-link dropdown-toggle"
href="#"
id="navbarDropdown"
role="button"
data-bs-toggle="dropdown"
aria-expanded="false"
>
More
</a>
<ul class="dropdown-menu" aria-labelledby="navbarDropdown">
<li>
<a class="dropdown-item" href="/logs">Logs</a>
</li>
<li>
<a class="dropdown-item" href="/about">About</a>
</li>
<li>
<a class="dropdown-item" href="https://git.mbeck.cologne">Help</a>
</li>
</ul>
</li>
</ul>
<form class="d-flex">
<input
class="form-control me-2"
type="search"
placeholder="Search"
aria-label="Search"
/>
<button class="btn btn-outline-success" type="submit">Search</button>
</form>
</div>
</div>
</nav>

View File

@ -0,0 +1,43 @@
<nav class="navbar navbar-expand-lg navbar-light bg-light">
<div class="container-fluid">
<a class="navbar-brand" href="{{ url_for('main.index') }}">{{ app_title }}</a>
<button class="navbar-toggler" type="button" data-bs-toggle="collapse" data-bs-target="#navbarSupportedContent"
aria-controls="navbarSupportedContent" aria-expanded="false" aria-label="Toggle navigation">
<span class="navbar-toggler-icon"></span>
</button>
<div class="collapse navbar-collapse" id="navbarSupportedContent">
<ul class="navbar-nav me-auto mb-2 mb-lg-0">
<li class="nav-item">
<a class="nav-link" href="{{ url_for('upload.upload') }}">Import CSV</a>
</li>
<li class="nav-item">
<a class="nav-link" href="{{ url_for('papers.list_papers') }}">Papers</a>
</li>
<li class="nav-item">
<a class="nav-link" href="{{ url_for('schedule.schedule') }}">Schedule</a>
</li>
<li class="nav-item dropdown">
<a class="nav-link dropdown-toggle" href="#" id="navbarDropdown" role="button" data-bs-toggle="dropdown"
aria-expanded="false">
More
</a>
<ul class="dropdown-menu" aria-labelledby="navbarDropdown">
<li>
<a class="dropdown-item" href="{{ url_for('logger.list_logs') }}">Logs</a>
</li>
<li>
<a class="dropdown-item" href="{{ url_for('main.about') }}">About</a>
</li>
<li>
<a class="dropdown-item" href="https://git.mbeck.cologne">Help</a>
</li>
</ul>
</li>
</ul>
<form class="d-flex">
<input class="form-control me-2" type="search" placeholder="Search" aria-label="Search" />
<button class="btn btn-outline-success" type="submit">Search</button>
</form>
</div>
</div>
</nav>

View File

@ -1,258 +0,0 @@
{% extends "base.html" %}
{% block title %}Papers{% endblock %}
{% block content %}
{# --- Sort direction logic for each column --- #}
{% set title_sort = 'asc' if sort_by != 'title' or sort_dir == 'desc' else 'desc' %}
{% set journal_sort = 'asc' if sort_by != 'journal' or sort_dir == 'desc' else 'desc' %}
{% set doi_sort = 'asc' if sort_by != 'doi' or sort_dir == 'desc' else 'desc' %}
{% set issn_sort = 'asc' if sort_by != 'issn' or sort_dir == 'desc' else 'desc' %}
{% set status_sort = 'asc' if sort_by != 'status' or sort_dir == 'desc' else 'desc' %}
{% set created_sort = 'asc' if sort_by != 'created_at' or sort_dir == 'desc' else 'desc' %}
{% set updated_sort = 'asc' if sort_by != 'updated_at' or sort_dir == 'desc' else 'desc' %}
<form method="get" class="mb-4 row g-3">
<div class="col-md-2">
<label>Status</label>
<select name="status" class="form-select">
<option value="">All</option>
{% if request.args.get('status') == 'Pending' %}
<option value="Pending" selected>Pending</option>
{% else %}
<option value="Pending">Pending</option>
{% endif %}
{% if request.args.get('status') == 'Done' %}
<option value="Done" selected>Done</option>
{% else %}
<option value="Done">Done</option>
{% endif %}
{% if request.args.get('status') == 'Failed' %}
<option value="Failed" selected>Failed</option>
{% else %}
<option value="Failed">Failed</option>
{% endif %}
</select>
</div>
<div class="col-md-2">
<label>Created from</label>
<input type="date" name="created_from" class="form-control" value="{{ request.args.get('created_from', '') }}">
</div>
<div class="col-md-2">
<label>Created to</label>
<input type="date" name="created_to" class="form-control" value="{{ request.args.get('created_to', '') }}">
</div>
<div class="col-md-2">
<label>Updated from</label>
<input type="date" name="updated_from" class="form-control" value="{{ request.args.get('updated_from', '') }}">
</div>
<div class="col-md-2">
<label>Updated to</label>
<input type="date" name="updated_to" class="form-control" value="{{ request.args.get('updated_to', '') }}">
</div>
<div class="col-md-2 d-flex align-items-end">
<button type="submit" class="btn btn-primary w-100">Filter</button>
</div>
</form>
<div class="modal fade" id="paperDetailModal" tabindex="-1" aria-hidden="true">
<div class="modal-dialog modal-lg modal-dialog-scrollable">
<div class="modal-content" id="paper-detail-content">
<!-- AJAX-loaded content will go here -->
</div>
</div>
</div>
<div class="d-flex align-items-center mb-4">
<!-- Statistics Section -->
<div class="me-auto">
<div class="list-group list-group-horizontal">
<div class="list-group-item d-flex justify-content-between align-items-center">
<strong>Total Papers</strong>
<span class="badge bg-primary rounded-pill">{{ total_papers }}</span>
</div>
{% for status, count in status_counts.items() %}
<div class="list-group-item d-flex justify-content-between align-items-center">
<strong>{{ status }}:</strong>
<span class="badge bg-primary rounded-pill">{{ count }}</span>
</div>
{% endfor %}
</div>
</div>
<!-- Pagination Section -->
<nav aria-label="Page navigation" class="mx-auto">
<ul class="pagination justify-content-center mb-0">
{% if pagination.has_prev %}
<li class="page-item">
{% set params = request.args.to_dict() %}
{% set _ = params.pop('page', None) %}
<a class="page-link" href="{{ url_for('main.list_papers', page=pagination.prev_num, **params) }}" aria-label="Previous">
<span aria-hidden="true">&laquo;</span>
</a>
</li>
{% else %}
<li class="page-item disabled">
<span class="page-link" aria-hidden="true">&laquo;</span>
</li>
{% endif %}
{% for page_num in pagination.iter_pages(left_edge=2, right_edge=2, left_current=2, right_current=2) %}
{% if page_num %}
<li class="page-item {% if page_num == pagination.page %}active{% endif %}">
{% set params = request.args.to_dict() %}
{% set _ = params.pop('page', None) %}
<a class="page-link" href="{{ url_for('main.list_papers', page=page_num, **params) }}">{{ page_num }}</a>
</li>
{% else %}
<li class="page-item disabled"><span class="page-link"></span></li>
{% endif %}
{% endfor %}
{% if pagination.has_next %}
<li class="page-item">
{% set params = request.args.to_dict() %}
{% set _ = params.pop('page', None) %}
<a class="page-link" href="{{ url_for('main.list_papers', page=pagination.next_num, **params) }}" aria-label="Next">
<span aria-hidden="true">&raquo;</span>
</a>
</li>
{% else %}
<li class="page-item disabled">
<span class="page-link" aria-hidden="true">&raquo;</span>
</li>
{% endif %}
</ul>
</nav>
<!-- Buttons Section -->
<div class="ms-auto">
<a href="{{ url_for('main.export_papers') }}" class="btn btn-outline-secondary">Export CSV</a>
</div>
</div>
<table class="table table-striped table-bordered table-smaller">
<thead>
<tr>
<th>
{% set params = request.args.to_dict() %}
{% set params = params.update({'sort_by': 'title', 'sort_dir': title_sort}) or params %}
<a href="{{ url_for('main.list_papers', **params) }}">Title</a>
</th>
<th>
{% set params = request.args.to_dict() %}
{% set params = params.update({'sort_by': 'journal', 'sort_dir': journal_sort}) or params %}
<a href="{{ url_for('main.list_papers', **params) }}">Journal</a>
</th>
<th>
{% set params = request.args.to_dict() %}
{% set params = params.update({'sort_by': 'doi', 'sort_dir': doi_sort}) or params %}
<a href="{{ url_for('main.list_papers', **params) }}">DOI</a>
</th>
<th>
{% set params = request.args.to_dict() %}
{% set params = params.update({'sort_by': 'issn', 'sort_dir': issn_sort}) or params %}
<a href="{{ url_for('main.list_papers', **params) }}">ISSN</a>
</th>
<th>
{% set params = request.args.to_dict() %}
{% set params = params.update({'sort_by': 'status', 'sort_dir': status_sort}) or params %}
<a href="{{ url_for('main.list_papers', **params) }}">Status</a>
</th>
<th>
{% set params = request.args.to_dict() %}
{% set params = params.update({'sort_by': 'created_at', 'sort_dir': created_sort}) or params %}
<a href="{{ url_for('main.list_papers', **params) }}">Created</a>
</th>
<th>
{% set params = request.args.to_dict() %}
{% set params = params.update({'sort_by': 'updated_at', 'sort_dir': updated_sort}) or params %}
<a href="{{ url_for('main.list_papers', **params) }}">Updated</a>
</th>
</tr>
</thead>
<tbody>
{% for paper in papers %}
<tr>
<td><a href="#" class="paper-link" data-url="{{ url_for('main.paper_detail', paper_id=paper.id) }}">{{ paper.title }}</a></td>
<td>{{ paper.journal }}</td>
<td>{{ paper.doi }}</td>
<td>{{ paper.issn }}</td>
<td>{{ paper.status }}</td>
<td>{{ paper.created_at.strftime('%Y-%m-%d %H:%M:%S') }}</td>
<td>{{ paper.updated_at.strftime('%Y-%m-%d %H:%M:%S') }}</td>
</tr>
{% endfor %}
</tbody>
</table>
</div>
<nav aria-label="Page navigation">
<ul class="pagination justify-content-center">
{% if pagination.has_prev %}
<li class="page-item">
{% set params = request.args.to_dict() %}
{% set _ = params.pop('page', None) %}
<a class="page-link" href="{{ url_for('main.list_papers', page=pagination.prev_num, **params) }}" aria-label="Previous">
<span aria-hidden="true">&laquo;</span>
</a>
</li>
{% else %}
<li class="page-item disabled">
<span class="page-link" aria-hidden="true">&laquo;</span>
</li>
{% endif %}
{% for page_num in pagination.iter_pages(left_edge=2, right_edge=2, left_current=2, right_current=2) %}
{% if page_num %}
<li class="page-item {% if page_num == pagination.page %}active{% endif %}">
{% set params = request.args.to_dict() %}
{% set _ = params.pop('page', None) %}
<a class="page-link" href="{{ url_for('main.list_papers', page=page_num, **params) }}">{{ page_num }}</a>
</li>
{% else %}
<li class="page-item disabled"><span class="page-link"></span></li>
{% endif %}
{% endfor %}
{% if pagination.has_next %}
<li class="page-item">
{% set params = request.args.to_dict() %}
{% set _ = params.pop('page', None) %}
<a class="page-link" href="{{ url_for('main.list_papers', page=pagination.next_num, **params) }}" aria-label="Next">
<span aria-hidden="true">&raquo;</span>
</a>
</li>
{% else %}
<li class="page-item disabled">
<span class="page-link" aria-hidden="true">&raquo;</span>
</li>
{% endif %}
</ul>
</nav>
<script>
document.addEventListener("DOMContentLoaded", function () {
const modal = new bootstrap.Modal(document.getElementById('paperDetailModal'));
const content = document.getElementById('paper-detail-content');
document.querySelectorAll('.paper-link').forEach(link => {
link.addEventListener('click', function (e) {
e.preventDefault();
const url = this.getAttribute('data-url');
fetch(url)
.then(response => response.text())
.then(html => {
content.innerHTML = html;
modal.show();
})
.catch(err => {
content.innerHTML = '<div class="modal-body text-danger">Error loading details.</div>';
modal.show();
});
});
});
});
</script>
{% endblock %}

View File

@ -0,0 +1,281 @@
{% extends "base.html.jinja" %}
{% block title %}Papers{% endblock title %}
{% block content %}
{# --- Sort direction logic for each column --- #}
{% set title_sort = 'asc' if sort_by != 'title' or sort_dir == 'desc' else 'desc' %}
{% set journal_sort = 'asc' if sort_by != 'journal' or sort_dir == 'desc' else 'desc' %}
{% set doi_sort = 'asc' if sort_by != 'doi' or sort_dir == 'desc' else 'desc' %}
{% set issn_sort = 'asc' if sort_by != 'issn' or sort_dir == 'desc' else 'desc' %}
{% set status_sort = 'asc' if sort_by != 'status' or sort_dir == 'desc' else 'desc' %}
{% set created_sort = 'asc' if sort_by != 'created_at' or sort_dir == 'desc' else 'desc' %}
{% set updated_sort = 'asc' if sort_by != 'updated_at' or sort_dir == 'desc' else 'desc' %}
<form method="get" class="mb-4 row g-3">
<div class="col-md-2">
<label>Status</label>
<select name="status" class="form-select">
<option value="">All</option>
{% if request.args.get('status') == 'Pending' %}
<option value="Pending" selected>Pending</option>
{% else %}
<option value="Pending">Pending</option>
{% endif %}
{% if request.args.get('status') == 'Done' %}
<option value="Done" selected>Done</option>
{% else %}
<option value="Done">Done</option>
{% endif %}
{% if request.args.get('status') == 'Failed' %}
<option value="Failed" selected>Failed</option>
{% else %}
<option value="Failed">Failed</option>
{% endif %}
</select>
</div>
<div class="col-md-2">
<label>Created from</label>
<input type="date" name="created_from" class="form-control" value="{{ request.args.get('created_from', '') }}">
</div>
<div class="col-md-2">
<label>Created to</label>
<input type="date" name="created_to" class="form-control" value="{{ request.args.get('created_to', '') }}">
</div>
<div class="col-md-2">
<label>Updated from</label>
<input type="date" name="updated_from" class="form-control" value="{{ request.args.get('updated_from', '') }}">
</div>
<div class="col-md-2">
<label>Updated to</label>
<input type="date" name="updated_to" class="form-control" value="{{ request.args.get('updated_to', '') }}">
</div>
<div class="col-md-2 d-flex align-items-end">
<button type="submit" class="btn btn-primary w-100">Filter</button>
</div>
</form>
<div class="modal fade" id="paperDetailModal" tabindex="-1" aria-hidden="true">
<div class="modal-dialog modal-lg modal-dialog-scrollable">
<div class="modal-content" id="paper-detail-content">
<!-- AJAX-loaded content will go here -->
</div>
</div>
</div>
<div class="d-flex align-items-center mb-4">
<!-- Statistics Section -->
<div class="me-auto">
<div class="list-group list-group-horizontal">
<div class="list-group-item d-flex justify-content-between align-items-center">
<strong>Total Papers</strong>
<span class="badge bg-primary rounded-pill">{{ total_papers }}</span>
</div>
{% for status, count in status_counts.items() %}
<div class="list-group-item d-flex justify-content-between align-items-center">
<strong>{{ status }}:</strong>
<span class="badge bg-primary rounded-pill">{{ count }}</span>
</div>
{% endfor %}
</div>
</div>
<!-- Pagination Section -->
<nav aria-label="Page navigation" class="mx-auto">
<ul class="pagination justify-content-center mb-0">
{% if pagination.has_prev %}
<li class="page-item">
{% set params = request.args.to_dict() %}
{% set _ = params.pop('page', None) %}
<a class="page-link" href="{{ url_for('papers.list_papers', page=pagination.prev_num, **params) }}"
aria-label="Previous">
<span aria-hidden="true">«</span>
</a>
</li>
{% else %}
<li class="page-item disabled">
<span class="page-link" aria-hidden="true">«</span>
</li>
{% endif %}
{% for page_num in pagination.iter_pages(left_edge=2, right_edge=2, left_current=2, right_current=2) %}
{% if page_num %}
<li class="page-item {% if page_num == pagination.page %}active{% endif %}">
{% set params = request.args.to_dict() %}
{% set _ = params.pop('page', None) %}
<a class="page-link" href="{{ url_for('papers.list_papers', page=page_num, **params) }}">{{ page_num
}}</a>
</li>
{% else %}
<li class="page-item disabled"><span class="page-link">…</span></li>
{% endif %}
{% endfor %}
{% if pagination.has_next %}
<li class="page-item">
{% set params = request.args.to_dict() %}
{% set _ = params.pop('page', None) %}
<a class="page-link" href="{{ url_for('papers.list_papers', page=pagination.next_num, **params) }}"
aria-label="Next">
<span aria-hidden="true">»</span>
</a>
</li>
{% else %}
<li class="page-item disabled">
<span class="page-link" aria-hidden="true">»</span>
</li>
{% endif %}
</ul>
</nav>
<!-- Buttons Section -->
<div class="ms-auto">
<a href="{{ url_for('papers.export_papers') }}" class="btn btn-outline-secondary">Export CSV</a>
</div>
</div>
<table class="table table-striped table-bordered table-smaller">
<thead>
<tr>
<th>
{% set params = request.args.to_dict() %}
{% set params = params.update({'sort_by': 'title', 'sort_dir': title_sort}) or params %}
<a href="{{ url_for('papers.list_papers', **params) }}">Title</a>
</th>
<th>
{% set params = request.args.to_dict() %}
{% set params = params.update({'sort_by': 'journal', 'sort_dir': journal_sort}) or params %}
<a href="{{ url_for('papers.list_papers', **params) }}">Journal</a>
</th>
<th>
{% set params = request.args.to_dict() %}
{% set params = params.update({'sort_by': 'doi', 'sort_dir': doi_sort}) or params %}
<a href="{{ url_for('papers.list_papers', **params) }}">DOI</a>
</th>
<th>
{% set params = request.args.to_dict() %}
{% set params = params.update({'sort_by': 'issn', 'sort_dir': issn_sort}) or params %}
<a href="{{ url_for('papers.list_papers', **params) }}">ISSN</a>
</th>
<th>
{% set params = request.args.to_dict() %}
{% set params = params.update({'sort_by': 'status', 'sort_dir': status_sort}) or params %}
<a href="{{ url_for('papers.list_papers', **params) }}">Status</a>
</th>
<th>
{% set params = request.args.to_dict() %}
{% set params = params.update({'sort_by': 'created_at', 'sort_dir': created_sort}) or params %}
<a href="{{ url_for('papers.list_papers', **params) }}">Created</a>
</th>
<th>
{% set params = request.args.to_dict() %}
{% set params = params.update({'sort_by': 'updated_at', 'sort_dir': updated_sort}) or params %}
<a href="{{ url_for('papers.list_papers', **params) }}">Updated</a>
</th>
</tr>
</thead>
<tbody>
{% for paper in papers %}
<tr>
<td>
<a href="#" class="icon-link icon-link-hover paper-link"
data-url="{{ url_for('papers.paper_detail', paper_id=paper.id) }}">
<svg xmlns="http://www.w3.org/2000/svg" class="bi" viewBox="0 0 16 16" aria-hidden="true">
<path
d="M4 1.5H3a2 2 0 0 0-2 2V14a2 2 0 0 0 2 2h10a2 2 0 0 0 2-2V3.5a2 2 0 0 0-2-2h-1v1h1a1 1 0 0 1 1 1V14a1 1 0 0 1-1 1H3a1 1 0 0 1-1-1V3.5a1 1 0 0 1 1-1h1v-1z" />
<path
d="M9.5 1a.5.5 0 0 1 .5.5v1a.5.5 0 0 1-.5.5h-3a.5.5 0 0 1-.5-.5v-1a.5.5 0 0 1 .5-.5h3zm-3-1A1.5 1.5 0 0 0 5 1.5v1A1.5 1.5 0 0 0 6.5 4h3A1.5 1.5 0 0 0 11 2.5v-1A1.5 1.5 0 0 0 9.5 0h-3z" />
</svg>
{{ paper.title }}
</a>
</td>
<td>{{ paper.journal }}</td>
<td>
<a href="https://doi.org/{{ paper.doi }}" target="_blank" class="icon-link icon-link-hover">
{{ paper.doi }}
<svg xmlns="http://www.w3.org/2000/svg" class="bi" viewBox="0 0 16 16" aria-hidden="true">
<path
d="M1 8a.5.5 0 0 1 .5-.5h11.793l-3.147-3.146a.5.5 0 0 1 .708-.708l4 4a.5.5 0 0 1 0 .708l-4 4a.5.5 0 0 1-.708-.708L13.293 8.5H1.5A.5.5 0 0 1 1 8z" />
</svg>
</a>
</td>
<td>{{ paper.issn }}</td>
<td>{{ paper.status }}</td>
<td>{{ paper.created_at.strftime('%Y-%m-%d %H:%M:%S') }}</td>
<td>{{ paper.updated_at.strftime('%Y-%m-%d %H:%M:%S') }}</td>
</tr>
{% endfor %}
</tbody>
</table>
<nav aria-label="Page navigation">
<ul class="pagination justify-content-center">
{% if pagination.has_prev %}
<li class="page-item">
{% set params = request.args.to_dict() %}
{% set _ = params.pop('page', None) %}
<a class="page-link" href="{{ url_for('papers.list_papers', page=pagination.prev_num, **params) }}"
aria-label="Previous">
<span aria-hidden="true">«</span>
</a>
</li>
{% else %}
<li class="page-item disabled">
<span class="page-link" aria-hidden="true">«</span>
</li>
{% endif %}
{% for page_num in pagination.iter_pages(left_edge=2, right_edge=2, left_current=2, right_current=2) %}
{% if page_num %}
<li class="page-item {% if page_num == pagination.page %}active{% endif %}">
{% set params = request.args.to_dict() %}
{% set _ = params.pop('page', None) %}
<a class="page-link" href="{{ url_for('papers.list_papers', page=page_num, **params) }}">{{ page_num }}</a>
</li>
{% else %}
<li class="page-item disabled"><span class="page-link">…</span></li>
{% endif %}
{% endfor %}
{% if pagination.has_next %}
<li class="page-item">
{% set params = request.args.to_dict() %}
{% set _ = params.pop('page', None) %}
<a class="page-link" href="{{ url_for('papers.list_papers', page=pagination.next_num, **params) }}"
aria-label="Next">
<span aria-hidden="true">»</span>
</a>
</li>
{% else %}
<li class="page-item disabled">
<span class="page-link" aria-hidden="true">»</span>
</li>
{% endif %}
</ul>
</nav>
<script>
document.addEventListener("DOMContentLoaded", function () {
const modal = new bootstrap.Modal(document.getElementById('paperDetailModal'));
const content = document.getElementById('paper-detail-content');
document.querySelectorAll('.paper-link').forEach(link => {
link.addEventListener('click', function (e) {
e.preventDefault();
const url = this.getAttribute('data-url');
fetch(url)
.then(response => response.text())
.then(html => {
content.innerHTML = html;
modal.show();
})
.catch(err => {
content.innerHTML = '<div class="modal-body text-danger">Error loading details.</div>';
modal.show();
});
});
});
});
</script>
{% endblock content %}

View File

@ -0,0 +1,18 @@
<div class="modal-header">
<h5 class="modal-title">Log Details</h5>
<button type="button" class="btn-close" data-bs-dismiss="modal"></button>
</div>
<div class="modal-body">
<p><strong>Timestamp:</strong> {{ log.timestamp }}</p>
<p><strong>Category:</strong> {{ log.category }}</p>
<p><strong>Action:</strong> {{ log.action }}</p>
<p><strong>Description:</strong> {{ log.description }}</p>
{% if log.extra_data %}
<p><strong>Extra Data:</strong>
<pre><code>{{ log.extra_data }}</code></pre>
</p>
{% endif %}
</div>
<div class="modal-footer">
<button type="button" class="btn btn-secondary" data-bs-dismiss="modal">Close</button>
</div>

View File

@ -1,14 +0,0 @@
<div class="modal-header">
<h5 class="modal-title">{{ paper.title }}</h5>
<button type="button" class="btn-close" data-bs-dismiss="modal"></button>
</div>
<div class="modal-body">
{% for key, value in paper.__dict__.items() %}
{% if not key.startswith('_') and key != 'metadata' %}
<p><strong>{{ key.replace('_', ' ').capitalize() }}:</strong> {{ value }}</p>
{% endif %}
{% endfor %}
</div>
<div class="modal-footer">
<button type="button" class="btn btn-secondary" data-bs-dismiss="modal">Close</button>
</div>

View File

@ -0,0 +1,32 @@
<div class="modal-header">
<h5 class="modal-title">{{ paper.title }}</h5>
<button type="button" class="btn-close" data-bs-dismiss="modal"></button>
</div>
<div class="modal-body">
{% for key, value in paper.__dict__.items() %} {% if key == 'doi' %}
<p>
<strong>DOI:</strong>
<a href="https://doi.org/{{ value }}" target="_blank">{{ value }}</a>
</p>
{% elif key == 'issn' %} {% if ',' in value %}
<p>
<strong>ISSN:</strong>
{% for issn in value.split(',') %}
<a href="https://www.worldcat.org/search?q=issn:{{ issn.strip() }}" target="_blank">{{ issn.strip() }}</a>{% if not
loop.last %}, {% endif %} {% endfor %}
</p>
{% else %}
<p>
<strong>ISSN:</strong>
<a href="https://www.worldcat.org/search?q=issn:{{ value }}" target="_blank">{{ value }}</a>
</p>
{% endif %} {% endif %} {% if not key.startswith('_') and key != 'metadata'
and key != 'doi' and key != 'issn' %}
<p><strong>{{ key.replace('_', ' ').capitalize() }}:</strong> {{ value }}</p>
{% endif %} {% endfor %}
</div>
<div class="modal-footer">
<button type="button" class="btn btn-secondary" data-bs-dismiss="modal">
Close
</button>
</div>

View File

@ -1,15 +1,17 @@
{% extends 'base.html' %} {% block content %}
{% extends "base.html.jinja" %} {% block content %}
<style>
.timeline {
display: flex;
flex-wrap: wrap;
gap: 3px;
user-select: none; /* Prevent text selection during drag */
user-select: none;
/* Prevent text selection during drag */
}
.hour-block {
width: 49px;
height: 70px; /* Increased height to fit additional text */
height: 70px;
/* Increased height to fit additional text */
border-radius: 5px;
text-align: center;
line-height: 1.2;
@ -76,11 +78,8 @@
messages %}
<div id="flash-messages">
{% for category, message in messages %}
<div
class="flash-message {{ category }}"
x-data="{}"
x-init="setTimeout(() => $el.classList.add('fade'), 100); setTimeout(() => $el.remove(), 5000)"
>
<div class="flash-message {{ category }}" x-data="{}"
x-init="setTimeout(() => $el.classList.add('fade'), 100); setTimeout(() => $el.remove(), 5000)">
{{ message }}
</div>
{% endfor %}
@ -118,39 +117,21 @@
<strong x-text="volume"></strong> papers.
</p>
<div class="d-flex align-items-center mb-3">
<form
method="POST"
action="{{ url_for('main.schedule') }}"
class="input-group w-50"
>
<form method="post" action="{{ url_for('schedule.schedule') }}" class="input-group w-50">
<label class="input-group-text">Papers per day:</label>
<input
type="number"
class="form-control"
name="total_volume"
value="{{ volume }}"
min="1"
max="1000"
required
/>
<input type="number" class="form-control" name="total_volume" value="{{ volume }}" min="1" max="1000"
required />
<button type="submit" class="btn btn-primary">Update Volume</button>
</form>
</div>
</div>
<h2 class="mt-4">Current Schedule</h2>
<form method="POST" action="{{ url_for('main.schedule') }}">
<form method="post" action="{{ url_for('schedule.schedule') }}">
<div class="timeline mb-3" @mouseup="endDrag()" @mouseleave="endDrag()">
<template x-for="hour in Object.keys(schedule)" :key="hour">
<div
class="hour-block"
:id="'hour-' + hour"
:data-hour="hour"
:style="getBackgroundStyle(hour)"
:class="{'selected': isSelected(hour)}"
@mousedown="startDrag($event, hour)"
@mouseover="dragSelect(hour)"
>
<div class="hour-block" :id="'hour-' + hour" :data-hour="hour" :style="getBackgroundStyle(hour)"
:class="{'selected': isSelected(hour)}" @mousedown="startDrag($event, hour)" @mouseover="dragSelect(hour)">
<div><strong x-text="formatHour(hour)"></strong></div>
<div class="weight"><span x-text="schedule[hour]"></span></div>
<div class="papers">
@ -163,25 +144,14 @@
<div class="input-group mb-4 w-50">
<label class="input-group-text">Set Weight:</label>
<input
type="number"
step="0.1"
min="0"
max="5"
x-model="newWeight"
class="form-control"
/>
<button
type="button"
class="btn btn-outline-primary"
@click="applyWeight()"
>
<input type="number" step="0.1" min="0" max="5" x-model="newWeight" class="form-control" />
<button type="button" class="btn btn-outline-primary" @click="applyWeight()">
Apply to Selected
</button>
</div>
<div class="d-flex justify-content-between">
<a href="/" class="btn btn-outline-secondary">⬅ Back</a>
<a href="{{ url_for('main.index') }}" class="btn btn-outline-secondary">⬅ Back</a>
<button type="submit" class="btn btn-success">💾 Save Schedule</button>
</div>
</form>
@ -297,4 +267,4 @@
};
}
</script>
{% endblock %}
{% endblock content %}

View File

@ -1,73 +0,0 @@
{% extends 'base.html' %}
{% block content %}
<h1>Welcome to SciPaperLoader</h1>
{% if success %}
<div class="alert alert-success mt-3">{{ success }}</div>
{% endif %}
{% if error_message %}
<div class="alert alert-warning mt-3">
<h4>{{ error_message }}</h4>
<table class="table table-sm table-bordered">
<thead>
<tr>
<th>Row</th>
<th>DOI</th>
<th>Error</th>
</tr>
</thead>
<tbody>
{% for error in error_samples %}
<tr>
<td>{{ error.row }}</td>
<td>{{ error.doi }}</td>
<td>{{ error.error }}</td>
</tr>
{% endfor %}
</tbody>
</table>
<a href="{{ url_for('main.download_error_log') }}" class="btn btn-outline-secondary">Download Full Error Log</a>
</div>
{% endif %}
<div class="alert alert-info">
<p><strong>Instructions:</strong> Please upload a CSV file containing academic paper metadata. The file must include the following columns:</p>
<ul>
<li><code>alternative_id</code> an alternative title or abbreviation</li>
<li><code>journal</code> the journal name</li>
<li><code>doi</code> the digital object identifier</li>
<li><code>issn</code> the ISSN of the journal</li>
<li><code>title</code> the title of the paper</li>
</ul>
<p>The format of your CSV should resemble the response structure of the Crossref API's <code>/journals/{issn}/works</code> endpoint.</p>
</div>
<form method="POST" action="{{ url_for('main.upload') }}" enctype="multipart/form-data">
<div class="mb-3">
<label class="form-label">How to handle duplicate DOIs:</label>
<div class="form-check">
<input class="form-check-input" type="radio" name="duplicate_strategy" value="skip" id="skip" checked>
<label class="form-check-label" for="skip">Skip duplicate entries</label>
</div>
<div class="form-check">
<input class="form-check-input" type="radio" name="duplicate_strategy" value="update" id="update">
<label class="form-check-label" for="update">Update existing entries</label>
</div>
</div>
<div class="form-group">
<label for="file">Upload CSV File</label>
<input type="file" name="file" id="file" class="form-control" required>
</div>
<div class="form-group mt-3">
<label for="delimiter">Choose CSV Delimiter</label>
<select name="delimiter" id="delimiter" class="form-control">
<option value=",">Comma (,)</option>
<option value=";">Semicolon (;)</option>
<option value="\t">Tab (\\t)</option>
<option value="|">Pipe (|)</option>
</select>
</div>
<button type="submit" class="btn btn-primary mt-3">Upload</button>
</form>
{% endblock %}

View File

@ -0,0 +1,234 @@
{% extends "base.html.jinja" %} {% block content %}
<h1>Welcome to SciPaperLoader</h1>
<div id="results-container"></div>
{% if success %}
<div class="alert alert-success mt-3">{{ success }}</div>
{% endif %} {% if error_message %}
<div class="alert alert-warning mt-3">
<h4>{{ error_message }}</h4>
<table class="table table-sm table-bordered">
<thead>
<tr>
<th>Row</th>
<th>DOI</th>
<th>Error</th>
</tr>
</thead>
<tbody>
{% for error in error_samples %}
<tr>
<td>{{ error.row }}</td>
<td>{{ error.doi }}</td>
<td>{{ error.error }}</td>
</tr>
{% endfor %}
</tbody>
</table>
<a href="{{ url_for('upload.download_error_log') }}" class="btn btn-outline-secondary">Download Full Error Log</a>
</div>
{% endif %}
<div class="alert alert-info">
<p>
<strong>Instructions:</strong> Please upload a CSV file containing academic
paper metadata. The file must include the following columns:
</p>
<ul>
<li><code>alternative_id</code> an alternative title or abbreviation</li>
<li><code>journal</code> the journal name</li>
<li><code>doi</code> the digital object identifier</li>
<li><code>issn</code> the ISSN of the journal</li>
<li><code>title</code> the title of the paper</li>
</ul>
</div>
<form method="post" action="{{ url_for('upload.upload') }}" enctype="multipart/form-data" id="upload-form">
<div class="form-group">
<label for="file">Upload CSV File</label>
<input type="file" name="file" id="file" class="form-control" required />
</div>
<div class="form-group mt-3">
<label for="delimiter">Choose CSV Delimiter</label>
<select name="delimiter" id="delimiter" class="form-control">
<option value=",">Comma (,)</option>
<option value=";">Semicolon (;)</option>
<option value="\t">Tab (\\t)</option>
<option value="|">Pipe (|)</option>
</select>
</div>
<button type="submit" class="btn btn-primary mt-3">Upload</button>
</form>
<!-- Progress Modal -->
<div id="progressModal" class="modal fade" tabindex="-1">
<div class="modal-dialog">
<div class="modal-content">
<div class="modal-header">
<h5 class="modal-title">Processing Your Upload</h5>
</div>
<div class="modal-body">
<div class="progress">
<div id="progressBar" class="progress-bar" role="progressbar">0%</div>
</div>
<p id="progressStatus" class="mt-2 text-center">Starting...</p>
</div>
</div>
</div>
</div>
<script>
const form = document.getElementById("upload-form");
form.addEventListener("submit", function (e) {
e.preventDefault();
// Display loading state immediately
const progressModal = new bootstrap.Modal(document.getElementById("progressModal"));
progressModal.show();
const progressBar = document.getElementById("progressBar");
progressBar.style.width = "5%";
progressBar.textContent = "Starting...";
const formData = new FormData(form);
// Disable the form while processing
const submitButton = form.querySelector("button[type='submit']");
submitButton.disabled = true;
fetch(form.action, {
method: "POST",
body: formData,
})
.then((response) => response.json())
.then((data) => {
if (data.error) {
// Handle error
progressModal.hide();
alert(`Error: ${data.error}`);
submitButton.disabled = false;
return;
}
const taskId = data.task_id;
const interval = setInterval(() => {
fetch("{{ url_for('upload.task_status', task_id='') }}" + taskId)
.then((response) => response.json())
.then((status) => {
console.log("Task status:", status);
if (status.state === "SUCCESS") {
clearInterval(interval);
progressBar.style.width = "100%";
progressBar.textContent = "Completed!";
setTimeout(() => {
progressModal.hide();
showResults(status.result);
submitButton.disabled = false;
}, 1000);
} else if (status.state === "FAILURE") {
clearInterval(interval);
progressBar.style.width = "100%";
progressBar.classList.add("bg-danger");
progressBar.textContent = "Failed!";
setTimeout(() => {
progressModal.hide();
alert(`Task failed: ${status.error || "Unknown error"}`);
submitButton.disabled = false;
}, 1000);
} else {
// Update progress bar with more information
const progress = status.progress || 0;
progressBar.style.width = `${progress}%`;
progressBar.textContent = `${progress}% complete`;
document.getElementById("progressStatus").innerText = `Processing... (${status.state})`;
}
})
.catch((err) => {
console.error("Failed to check task status:", err);
});
}, 1000);
})
.catch((err) => {
console.error("Upload failed:", err);
progressModal.hide();
alert("Upload failed. Please try again.");
submitButton.disabled = false;
});
});
const showResults = (result) => {
const message = `Upload completed! Added: ${result.added}, Updated: ${result.updated}, Skipped: ${result.skipped}, Errors: ${result.error_count}`;
let resultHTML = `<div class="alert alert-success">${message}</div>`;
// Add skipped records information
if (result.skipped > 0) {
resultHTML += `
<div class="alert alert-info">
<h4>${result.skipped} records were skipped</h4>
<p>${result.skipped_reason_summary || "Records were skipped because they already exist in the database."}</p>
${result.skipped_records && result.skipped_records.length > 0 ? `
<p>Examples of skipped records:</p>
<table class="table table-sm table-bordered">
<thead>
<tr>
<th>Row</th>
<th>DOI</th>
<th>Reason</th>
</tr>
</thead>
<tbody>
${result.skipped_records.map(record => `
<tr>
<td>${record.row}</td>
<td>${record.doi}</td>
<td>${record.reason}</td>
</tr>
`).join('')}
</tbody>
</table>
` : ''}
</div>`;
}
// Existing error display code
if (result.error_count > 0) {
resultHTML += `
<div class="alert alert-warning">
<h4>Some errors occurred (${result.error_count} total)</h4>
<p>Showing first ${result.errors.length} of ${result.error_count} errors:</p>
<table class="table table-sm table-bordered">
<thead>
<tr>
<th>Row</th>
<th>DOI</th>
<th>Error</th>
</tr>
</thead>
<tbody>`;
result.errors.forEach(error => {
resultHTML += `
<tr>
<td>${error.row}</td>
<td>${error.doi}</td>
<td>${error.error}</td>
</tr>`;
});
resultHTML += `
</tbody>
</table>
<p class="mt-2">Download the complete error log with all ${result.error_count} errors:</p>
<a href="/upload/download_error_log/${result.task_id}" class="btn btn-outline-secondary">
Download Full Error Log
</a>
</div>`;
}
document.getElementById("results-container").innerHTML = resultHTML;
};
</script>
{% endblock content %}

View File

@ -1,389 +0,0 @@
import codecs
import csv
import datetime
import io
from io import StringIO
import pandas as pd
from flask import (
Blueprint,
current_app,
flash,
redirect,
render_template,
request,
send_file,
session, # Add this line
url_for,
)
from sqlalchemy import asc, desc
from .db import db
from .models import PaperMetadata, ScheduleConfig, VolumeConfig
bp = Blueprint("main", __name__)
@bp.route("/")
def index():
return render_template("index.html")
REQUIRED_COLUMNS = {"alternative_id", "journal", "doi", "issn", "title"}
@bp.route("/upload", methods=["GET", "POST"])
def upload():
if request.method == "POST":
file = request.files.get("file")
delimiter = request.form.get("delimiter", ",")
duplicate_strategy = request.form.get("duplicate_strategy", "skip")
if not file:
return render_template("upload.html", error="No file selected.")
try:
stream = codecs.iterdecode(file.stream, "utf-8")
content = "".join(stream)
df = pd.read_csv(StringIO(content), delimiter=delimiter)
except Exception as e:
return render_template("upload.html", error=f"Failed to read CSV file: {e}")
missing = REQUIRED_COLUMNS - set(df.columns)
if missing:
return render_template(
"upload.html", error=f"Missing required columns: {', '.join(missing)}"
)
# Optional: parse 'published_online' to date
def parse_date(val):
if pd.isna(val):
return None
try:
return pd.to_datetime(val).date()
except Exception:
return None
# Count statistics
added_count = 0
skipped_count = 0
updated_count = 0
error_count = 0
# Collect error information
errors = []
# Process each row
for index, row in df.iterrows():
try:
# Get DOI from row for error reporting
doi = str(row.get("doi", "N/A"))
# Validate required fields
for field in ["title", "doi", "issn"]:
if pd.isna(row.get(field)) or not str(row.get(field)).strip():
raise ValueError(f"Missing required field: {field}")
# Check if paper with this DOI already exists
existing = PaperMetadata.query.filter_by(doi=doi).first()
if existing:
if duplicate_strategy == 'update':
# Update existing record
existing.title = row["title"]
existing.alt_id = row.get("alternative_id")
existing.issn = row["issn"]
existing.journal = row.get("journal")
existing.type = row.get("type")
existing.language = row.get("language")
existing.published_online = parse_date(row.get("published_online"))
updated_count += 1
else:
# Skip this record
skipped_count += 1
continue
else:
# Create new record
metadata = PaperMetadata(
title=row["title"],
doi=doi,
alt_id=row.get("alternative_id"),
issn=row["issn"],
journal=row.get("journal"),
type=row.get("type"),
language=row.get("language"),
published_online=parse_date(row.get("published_online")),
status="New",
file_path=None,
error_msg=None,
)
db.session.add(metadata)
added_count += 1
except Exception as e:
error_count += 1
errors.append({
"row": index + 2, # +2 because index is 0-based and we have a header row
"doi": row.get("doi", "N/A"),
"error": str(e)
})
continue # Skip this row and continue with the next
try:
db.session.commit()
except Exception as e:
db.session.rollback()
return render_template(
"upload.html", error=f"Failed to save data to database: {e}"
)
# Prepare error samples for display
error_samples = errors[:5] if errors else []
error_message = None
if errors:
error_message = f"Encountered {len(errors)} errors. First 5 shown below."
# Store the full errors list in the session for potential download
if errors:
error_csv = StringIO()
writer = csv.DictWriter(error_csv, fieldnames=["row", "doi", "error"])
writer.writeheader()
writer.writerows(errors)
session["error_data"] = error_csv.getvalue()
return render_template(
"upload.html",
success=f"File processed! Added: {added_count}, Updated: {updated_count}, Skipped: {skipped_count}, Errors: {error_count}",
error_message=error_message,
error_samples=error_samples
)
return render_template("upload.html")
# Add a route to download the error log
@bp.route("/download_error_log")
def download_error_log():
error_data = session.get("error_data")
if not error_data:
flash("No error data available.")
return redirect(url_for("main.upload"))
buffer = StringIO(error_data)
return send_file(
buffer,
mimetype="text/csv",
as_attachment=True,
download_name=f"upload_errors_{datetime.datetime.now().strftime('%Y%m%d_%H%M%S')}.csv"
)
@bp.route("/papers")
def list_papers():
page = request.args.get("page", 1, type=int)
per_page = 50
# Filters
status = request.args.get("status")
created_from = request.args.get("created_from")
created_to = request.args.get("created_to")
updated_from = request.args.get("updated_from")
updated_to = request.args.get("updated_to")
sort_by = request.args.get("sort_by", "created_at")
sort_dir = request.args.get("sort_dir", "desc")
query = PaperMetadata.query
# Apply filters
if status:
query = query.filter(PaperMetadata.status == status)
def parse_date(val):
from datetime import datetime
try:
return datetime.strptime(val, "%Y-%m-%d")
except (ValueError, TypeError):
return None
if created_from := parse_date(created_from):
query = query.filter(PaperMetadata.created_at >= created_from)
if created_to := parse_date(created_to):
query = query.filter(PaperMetadata.created_at <= created_to)
if updated_from := parse_date(updated_from):
query = query.filter(PaperMetadata.updated_at >= updated_from)
if updated_to := parse_date(updated_to):
query = query.filter(PaperMetadata.updated_at <= updated_to)
# Sorting
sort_col = getattr(PaperMetadata, sort_by, PaperMetadata.created_at)
sort_func = desc if sort_dir == "desc" else asc
query = query.order_by(sort_func(sort_col))
# Pagination
pagination = query.paginate(page=page, per_page=per_page, error_out=False)
# Statistics
total_papers = PaperMetadata.query.count()
status_counts = (
db.session.query(PaperMetadata.status, db.func.count(PaperMetadata.status))
.group_by(PaperMetadata.status)
.all()
)
status_counts = {status: count for status, count in status_counts}
return render_template(
"papers.html",
papers=pagination.items,
pagination=pagination,
total_papers=total_papers,
status_counts=status_counts,
sort_by=sort_by,
sort_dir=sort_dir,
)
@bp.route("/papers/export")
def export_papers():
query = PaperMetadata.query
# Filters
status = request.args.get("status")
created_from = request.args.get("created_from")
created_to = request.args.get("created_to")
updated_from = request.args.get("updated_from")
updated_to = request.args.get("updated_to")
sort_by = request.args.get("sort_by", "created_at")
sort_dir = request.args.get("sort_dir", "desc")
query = PaperMetadata.query
# Apply filters
if status:
query = query.filter(PaperMetadata.status == status)
def parse_date(val):
try:
return datetime.datetime.strptime(val, "%Y-%m-%d")
except Exception:
return None
output = io.StringIO()
writer = csv.writer(output)
writer.writerow(
["ID", "Title", "Journal", "DOI", "ISSN", "Status", "Created At", "Updated At"]
)
for paper in query:
writer.writerow(
[
paper.id,
paper.title,
getattr(paper, "journal", ""),
paper.doi,
paper.issn,
paper.status,
paper.created_at,
paper.updated_at,
]
)
output.seek(0)
return send_file(
io.BytesIO(output.read().encode("utf-8")),
mimetype="text/csv",
as_attachment=True,
download_name="papers.csv",
)
@bp.route("/papers/<int:paper_id>/detail")
def paper_detail(paper_id):
paper = PaperMetadata.query.get_or_404(paper_id)
return render_template("partials/paper_detail_modal.html", paper=paper)
@bp.route("/schedule", methods=["GET", "POST"])
def schedule():
if request.method == "POST":
try:
# Check if we're updating volume or schedule
if "total_volume" in request.form:
# Volume update
try:
new_volume = float(request.form.get("total_volume", 0))
if new_volume <= 0 or new_volume > 1000:
raise ValueError("Volume must be between 1 and 1000")
volume_config = VolumeConfig.query.first()
if not volume_config:
volume_config = VolumeConfig(volume=new_volume)
db.session.add(volume_config)
else:
volume_config.volume = new_volume
db.session.commit()
flash("Volume updated successfully!", "success")
except ValueError as e:
db.session.rollback()
flash(f"Error updating volume: {str(e)}", "error")
else:
# Schedule update logic
# Validate form data
for hour in range(24):
key = f"hour_{hour}"
if key not in request.form:
raise ValueError(f"Missing data for hour {hour}")
try:
weight = float(request.form.get(key, 0))
if weight < 0 or weight > 5:
raise ValueError(
f"Weight for hour {hour} must be between 0 and 5"
)
except ValueError:
raise ValueError(f"Invalid weight value for hour {hour}")
# Update database if validation passes
for hour in range(24):
key = f"hour_{hour}"
weight = float(request.form.get(key, 0))
config = ScheduleConfig.query.get(hour)
if config:
config.weight = weight
else:
db.session.add(ScheduleConfig(hour=hour, weight=weight))
db.session.commit()
flash("Schedule updated successfully!", "success")
except ValueError as e:
db.session.rollback()
flash(f"Error updating schedule: {str(e)}", "error")
schedule = {
sc.hour: sc.weight
for sc in ScheduleConfig.query.order_by(ScheduleConfig.hour).all()
}
volume = VolumeConfig.query.first()
return render_template(
"schedule.html",
schedule=schedule,
volume=volume.volume,
app_title="PaperScraper",
)
@bp.route("/logs")
def logs():
return render_template("logs.html", app_title="PaperScraper")
@bp.route("/about")
def about():
return render_template("about.html", app_title="PaperScraper")

1641
testdata.csv Normal file

File diff suppressed because it is too large Load Diff