{% extends "base.html.jinja" %} {% block content %}

πŸ“˜ About This App

The Research Paper Scraper is a lightweight web-based tool designed to help researchers manage and download large sets of academic papers efficiently, using only a list of DOIs.


πŸ” What It Does

This app automates the process of downloading research paper PDFs based on metadata provided in a CSV file. It’s especially useful when dealing with hundreds or thousands of papers you want to collect for offline access or analysis.

You simply upload a structured CSV file with paper metadata, and the system takes care of the rest – importing, organizing, and downloading each paper in the background.

βš™οΈ How It Works

1. CSV Import

Users start by uploading a CSV file that contains metadata for many papers (such as title, DOI, ISSN, etc.). The app only stores the fields it needs – like the DOI, title, and publication date – and validates each entry before importing it into the internal database.

2. Metadata Management

Each paper is stored in a local SQLite database, along with its status:

3. Background Scraping

A separate background process runs 24/7, automatically downloading papers based on a configurable hourly schedule. It uses tools like the Zotero API to fetch the best available version of each paper (ideally as a PDF), and stores them on disk in neatly organized folders, one per paper.

To avoid triggering download limits or spam detection, download times are randomized within each hour to mimic natural behavior.

4. Smart Scheduling

You can set how many papers the system should attempt to download during each hour of the day. This allows you to, for example, schedule more downloads during daytime and pause at night – or tailor usage to match your institution’s bandwidth or rate limits.

5. Easy Web Interface

Everything is managed through a simple, responsive web interface:

No command-line tools or scripts required – everything works in your browser.

πŸ“¦ File Storage

Downloaded PDFs are saved to a structured folder on the server, with each paper in its own directory based on the DOI. The app never stores files inside the database – only references to where each PDF is located.

πŸ”’ Simple & Local

This app is designed for internal use on a local server or research workstation. It does not send or expose data to third parties. Everything – from file storage to scheduling – happens locally, giving you full control over your paper collection process.

πŸ’‘ Who It's For

This tool is ideal for:

{% endblock content %}