SciPaperLoader/README.md

# SciPaperLoader: Flask Application Initial Structure

## Project Overview

**SciPaperLoader** is a Flask-based web application for managing scientific papers. It provides a web interface (with Jinja2 templates) enhanced by **Alpine.js** for interactive UI components and **HTMX** for partial page updates without full reloads. The application is composed of two main parts: a Flask web app (serving pages for uploading data, configuring schedules, and viewing logs) and a background **scraper daemon** that runs independently to perform long-running tasks (like fetching paper details on a schedule). The project is organized following Flask best practices (using blueprints, separating static files and templates) and is set up for easy development and testing (with configuration files and a pytest test fixture).

## Quick Start

Run the application:

    make run

And open it in the browser at [http://localhost:5000/](http://localhost:5000/)

## Prerequisites

- Python >=3.8
- Redis (for Celery task queue)

## Development environment

 - `make venv`: creates a virtualenv with dependencies and this application
   installed in [development mode](http://setuptools.readthedocs.io/en/latest/setuptools.html#development-mode)

 - `make run`: runs a development server in debug mode (changes in source code
   are reloaded automatically)

 - `make format`: reformats code

 - `make lint`: runs flake8

 - `make mypy`: runs type checks by mypy

 - `make test`: runs tests (see also: [Testing Flask Applications](https://flask.palletsprojects.com/en/3.0.x/testing/))

 - `make dist`: creates a wheel distribution (will run tests first)

 - `make clean`: removes virtualenv and build artifacts

 - add application dependencies in `pyproject.toml` under `project.dependencies`;
   add development dependencies under `project.optional-dependencies.*`; run
   `make clean && make venv` to reinstall the environment

## Asynchronous Task Processing with Celery

SciPaperLoader uses Celery for processing large CSV uploads and other background tasks. This allows the application to handle large datasets reliably without blocking the web interface.

### Running Celery Components

- `make redis`: ensures Redis server is running (required for Celery)

- `make celery`: starts a Celery worker to process background tasks

- `make celery-flower`: starts Flower, a web interface for monitoring Celery tasks at http://localhost:5555

- `make run-all`: runs the entire stack (Flask app + Celery worker + Redis) in development mode

### How It Works

When you upload a CSV file through the web interface:

1. The file is sent to the server
2. A Celery task is created to process the file asynchronously
3. The browser shows a progress bar with real-time updates
4. The results are displayed when processing is complete

This architecture allows SciPaperLoader to handle CSV files with thousands of papers without timing out or blocking the web interface.

## Configuration

Default configuration is loaded from `scipaperloader.defaults` and can be
overriden by environment variables with a `FLASK_` prefix. See
[Configuring from Environment Variables](https://flask.palletsprojects.com/en/3.0.x/config/#configuring-from-environment-variables).

### Celery Configuration

The following environment variables can be set to configure Celery:

- `FLASK_CELERY_BROKER_URL`: Redis URL for the message broker (default: `redis://localhost:6379/0`)
- `FLASK_CELERY_RESULT_BACKEND`: Redis URL for storing task results (default: `redis://localhost:6379/0`)

Consider using
[dotenv](https://flask.palletsprojects.com/en/3.0.x/cli/#environment-variables-from-dotenv).

## Deployment

See [Deploying to Production](https://flask.palletsprojects.com/en/3.0.x/deploying/).

You may use the distribution (`make dist`) to publish it to a package index,
deliver to your server, or copy in your `Dockerfile`, and insall it with `pip`.

You must set a
[SECRET_KEY](https://flask.palletsprojects.com/en/3.0.x/tutorial/deploy/#configure-the-secret-key)
in production to a secret and stable value.

### Deploying with Celery

When deploying to production:

1. Configure a production-ready Redis instance or use a managed service
2. Run Celery workers as system services or in Docker containers
3. Consider setting up monitoring for your Celery tasks and workers