{% extends "base.html" %} {% block content %}
The Research Paper Scraper is a lightweight web-based tool designed to help researchers manage and download large sets of academic papers efficiently, using only a list of DOIs.
This app automates the process of downloading research paper PDFs based on metadata provided in a CSV file. Itβs especially useful when dealing with hundreds or thousands of papers you want to collect for offline access or analysis.
You simply upload a structured CSV file with paper metadata, and the system takes care of the rest β importing, organizing, and downloading each paper in the background.
Users start by uploading a CSV file that contains metadata for many papers (such as title, DOI, ISSN, etc.). The app only stores the fields it needs β like the DOI, title, and publication date β and validates each entry before importing it into the internal database.
Each paper is stored in a local SQLite database, along with its status:
A separate background process runs 24/7, automatically downloading papers based on a configurable hourly schedule. It uses tools like the Zotero API to fetch the best available version of each paper (ideally as a PDF), and stores them on disk in neatly organized folders, one per paper.
To avoid triggering download limits or spam detection, download times are randomized within each hour to mimic natural behavior.
You can set how many papers the system should attempt to download during each hour of the day. This allows you to, for example, schedule more downloads during daytime and pause at night β or tailor usage to match your institutionβs bandwidth or rate limits.
Everything is managed through a simple, responsive web interface:
No command-line tools or scripts required β everything works in your browser.
Downloaded PDFs are saved to a structured folder on the server, with each paper in its own directory based on the DOI. The app never stores files inside the database β only references to where each PDF is located.
This app is designed for internal use on a local server or research workstation. It does not send or expose data to third parties. Everything β from file storage to scheduling β happens locally, giving you full control over your paper collection process.
This tool is ideal for: