mischbeck/SciPaperLoader

Fork 0

Go to file

Michael Beck 11f086aa64 implements download path configuration

2025-04-16 22:03:17 +02:00

scipaperloader

implements download path configuration

2025-04-16 22:03:17 +02:00

tests

init

2025-03-30 18:44:30 +02:00

.gitignore

adds .csv to gitignore

2025-04-16 14:04:43 +02:00

celery_worker.py

fixes dummy paper processing

2025-04-16 16:32:52 +02:00

DEVELOPMENT

adds flask-migrate instructions to readme and deletes old developer instructions

2025-04-12 15:06:35 +02:00

DEVELOPMENT.md

init

2025-03-30 18:44:30 +02:00

dump.rdb

implements asynchronous task management to the input module.

2025-04-12 12:55:19 +02:00

LICENSE

init

2025-03-30 18:44:30 +02:00

Makefile

ups

2025-04-16 22:02:57 +02:00

pyproject.toml

fixes typo

2025-04-16 15:44:55 +02:00

README.md

adds flask-migrate instructions to readme and deletes old developer instructions

2025-04-12 15:06:35 +02:00

setup.cfg

init

2025-03-30 18:44:30 +02:00

setup.py

init

2025-03-30 18:44:30 +02:00

testdata.csv

implements asynchronous task management to the input module.

2025-04-12 12:55:19 +02:00

README.md

SciPaperLoader: Flask Application Initial Structure

Project Overview

SciPaperLoader is a Flask-based web application for managing scientific papers. It provides a web interface (with Jinja2 templates) enhanced by Alpine.js for interactive UI components and HTMX for partial page updates without full reloads. The application is composed of two main parts: a Flask web app (serving pages for uploading data, configuring schedules, and viewing logs) and a background scraper daemon that runs independently to perform long-running tasks (like fetching paper details on a schedule). The project is organized following Flask best practices (using blueprints, separating static files and templates) and is set up for easy development and testing (with configuration files and a pytest test fixture).

Quick Start

Run the application:

make run

And open it in the browser at http://localhost:5000/

Prerequisites

Python >=3.8
Redis (for Celery task queue)

Development environment

make venv: creates a virtualenv with dependencies and this application installed in development mode
make run: runs a development server in debug mode (changes in source code are reloaded automatically)
make format: reformats code
make lint: runs flake8
make mypy: runs type checks by mypy
make test: runs tests (see also: Testing Flask Applications)
make dist: creates a wheel distribution (will run tests first)
make clean: removes virtualenv and build artifacts
add application dependencies in pyproject.toml under project.dependencies; add development dependencies under project.optional-dependencies.*; run make clean && make venv to reinstall the environment

Asynchronous Task Processing with Celery

SciPaperLoader uses Celery for processing large CSV uploads and other background tasks. This allows the application to handle large datasets reliably without blocking the web interface.

Running Celery Components

make redis: ensures Redis server is running (required for Celery)
make celery: starts a Celery worker to process background tasks
make celery-flower: starts Flower, a web interface for monitoring Celery tasks at http://localhost:5555
make run-all: runs the entire stack (Flask app + Celery worker + Redis) in development mode

How It Works

When you upload a CSV file through the web interface:

The file is sent to the server
A Celery task is created to process the file asynchronously
The browser shows a progress bar with real-time updates
The results are displayed when processing is complete

This architecture allows SciPaperLoader to handle CSV files with thousands of papers without timing out or blocking the web interface.

Configuration

Default configuration is loaded from scipaperloader.defaults and can be overriden by environment variables with a FLASK_ prefix. See Configuring from Environment Variables.

Celery Configuration

The following environment variables can be set to configure Celery:

FLASK_CELERY_BROKER_URL: Redis URL for the message broker (default: redis://localhost:6379/0)
FLASK_CELERY_RESULT_BACKEND: Redis URL for storing task results (default: redis://localhost:6379/0)

Consider using dotenv.

Database Migrations with Flask-Migrate

SciPaperLoader uses Flask-Migrate (based on Alembic) to handle database schema changes. This allows for version-controlled database updates that can be applied or rolled back as needed.

Database Migration Commands

make db-migrate message="Description of changes": Create a new migration script based on detected model changes
make db-upgrade: Apply all pending migration scripts to the database
make db-downgrade: Revert the most recent migration
make reset-db: Reset the database completely (delete, initialize, and migrate)

Working with Migrations

When you make changes to the database models (in models.py):

Create a migration: make db-migrate message="Add user roles table"
Review the generated migration script in the migrations/versions/ directory
Apply the migration: make db-upgrade
To roll back a problematic migration: make db-downgrade

Always create database backups before applying migrations in production using make backup-db.

Deployment

See Deploying to Production.

You may use the distribution (make dist) to publish it to a package index, deliver to your server, or copy in your Dockerfile, and insall it with pip.

You must set a SECRET_KEY in production to a secret and stable value.

Deploying with Celery

When deploying to production:

Configure a production-ready Redis instance or use a managed service
Run Celery workers as system services or in Docker containers
Consider setting up monitoring for your Celery tasks and workers