SciPaperLoader/scipaperloader/templates/about.html.jinja

{% extends "base.html.jinja" %} {% block content %}
<h1 class="mb-4">📘 About This App</h1>

<p class="lead">
  <strong>The Research Paper Scraper</strong> is a lightweight web-based tool
  designed to help researchers manage and download large sets of academic papers
  efficiently, using only a list of DOIs.
</p>

<hr class="my-4" />

<section class="mb-5">
  <h2 class="h4">🔍 What It Does</h2>
  <p>
    This app automates the process of downloading research paper PDFs based on
    metadata provided in a CSV file. It’s especially useful when dealing with
    hundreds or thousands of papers you want to collect for offline access or
    analysis.
  </p>
  <p>
    You simply upload a structured CSV file with paper metadata, and the system
    takes care of the rest – importing, organizing, and downloading each paper
    in the background.
  </p>
</section>

<section class="mb-5">
  <h2 class="h4">⚙️ How It Works</h2>

  <h5 class="mt-4">1. CSV Import</h5>
  <p>
    Users start by uploading a CSV file that contains metadata for many papers
    (such as title, DOI, ISSN, etc.). The app only stores the fields it needs –
    like the DOI, title, and publication date – and validates each entry before
    importing it into the internal database.
  </p>

  <h5 class="mt-4">2. Metadata Management</h5>
  <p>Each paper is stored in a local SQLite database, along with its status:</p>
  <ul>
    <li><strong>Pending</strong>: Ready to be downloaded.</li>
    <li><strong>Done</strong>: Successfully downloaded.</li>
    <li><strong>Failed</strong>: Something went wrong (e.g. PDF not found).</li>
  </ul>

  <h5 class="mt-4">3. Background Scraping</h5>
  <p>
    A separate background process runs 24/7, automatically downloading papers
    based on a configurable hourly schedule. It uses tools like the Zotero API
    to fetch the best available version of each paper (ideally as a PDF), and
    stores them on disk in neatly organized folders, one per paper.
  </p>
  <p>
    To avoid triggering download limits or spam detection, download times are
    <strong>randomized within each hour</strong> to mimic natural behavior.
  </p>

  <h5 class="mt-4">4. Smart Scheduling</h5>
  <p>
    You can set how many papers the system should attempt to download during
    each hour of the day. This allows you to, for example, schedule more
    downloads during daytime and pause at night – or tailor usage to match your
    institution’s bandwidth or rate limits.
  </p>

  <h5 class="mt-4">5. Easy Web Interface</h5>
  <p>Everything is managed through a simple, responsive web interface:</p>
  <ul>
    <li>📥 Upload CSV files</li>
    <li>📄 Track the status of each paper</li>
    <li>⚠️ See which downloads failed, and why</li>
    <li>📂 Download PDFs directly from the browser</li>
    <li>🕒 Adjust the hourly download schedule</li>
  </ul>
  <p>
    No command-line tools or scripts required – everything works in your
    browser.
  </p>
</section>

<section class="mb-5">
  <h2 class="h4">📦 File Storage</h2>
  <p>
    Downloaded PDFs are saved to a structured folder on the server, with each
    paper in its own directory based on the DOI. The app never stores files
    inside the database – only references to where each PDF is located.
  </p>
</section>

<section class="mb-5">
  <h2 class="h4">🔒 Simple & Local</h2>
  <p>
    This app is designed for internal use on a local server or research
    workstation. It does not send or expose data to third parties. Everything –
    from file storage to scheduling – happens locally, giving you full control
    over your paper collection process.
  </p>
</section>

<section class="mb-5">
  <h2 class="h4">💡 Who It's For</h2>
  <p>This tool is ideal for:</p>
  <ul>
    <li>Research assistants organizing large literature datasets</li>
    <li>Labs preparing reading archives for team members</li>
    <li>Faculty compiling papers for courses or research reviews</li>
    <li>Anyone needing a structured way to fetch and track papers in bulk</li>
  </ul>
</section>
{% endblock content %}