You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

7.0 KiB

CLAUDE.md — Letters & Poetry Project

Project Overview

A Python project that downloads, parses, and displays historic love letters and classic poetry from Project Gutenberg. Data is pre-parsed and committed to git so end users don't need to download anything. A web UI is served via the hicalsoft.github.io GitHub Pages site.

Repository Structure

├── download_letters.py          # Downloads & parses 11 letter sources from Gutenberg
├── download_poetry.py           # Downloads & parses 15 poetry sources from Gutenberg
├── generate_web_data.py         # Combines JSON files into web UI data (letters.json / poetry.json)
├── love_letters.py              # CLI app: displays random letters in the terminal
├── letters/                     # Pre-parsed letter JSON files (11 sources, ~1,307 letters)
├── poetry/                      # Pre-parsed poetry JSON files (15 sources, ~3,098 poems)
├── hicalsoft.github.io/         # Embedded repo for GitHub Pages web UI (separate git history)
│   ├── letters/index.html       # Standalone SPA for browsing letters
│   ├── letters/data/letters.json
│   ├── poetry/index.html        # Standalone SPA for browsing poetry
│   └── poetry/data/poetry.json
└── README.md

Key Commands

# Download all letter sources (requires internet)
python3 download_letters.py

# Download all poetry sources (requires internet)
python3 download_poetry.py

# Run CLI app (no internet needed — reads from letters/ directory)
python3 love_letters.py                  # Random letter
python3 love_letters.py --list           # List sources
python3 love_letters.py --source napoleon  # Filter by source

Data Pipeline

  1. Download scripts fetch raw .txt files from Project Gutenberg via urllib
  2. Each source has a custom extractor function that parses the Gutenberg text format
  3. Parsed data is saved as JSON to letters/ or poetry/ directories
  4. generate_web_data.py combines the individual JSON files into letters.json / poetry.json for the web UI

Adding new sources or regenerating data

# Step 1: Download/parse source data (requires internet)
python3 download_letters.py          # Re-download all letter sources
python3 download_poetry.py           # Re-download all poetry sources

# Step 2: Regenerate web UI JSON (no internet needed)
python3 generate_web_data.py              # Both letters + poetry
python3 generate_web_data.py --letters    # Letters only
python3 generate_web_data.py --poetry     # Poetry only

# Step 3: Commit changes to both repos
git add -A && git commit -m "Update data"
cd hicalsoft.github.io && git add -A && git commit -m "Update data"

To add a new letter or poetry source:

  1. Add an extractor function to download_letters.py or download_poetry.py
  2. Add the source to the SOURCES list in that file's main()
  3. Run the download script to generate the new JSON in letters/ or poetry/
  4. Run python3 generate_web_data.py to rebuild the combined web UI data
  5. Commit to both repositories

Gutenberg Parsing Notes

  • Line endings: Always normalize with .replace("\r\n", "\n").replace("\r", "\n") before regex splitting
  • START/END markers vary: "*** START OF THE PROJECT GUTENBERG EBOOK", "*** START OF THIS PROJECT GUTENBERG EBOOK", etc. — use regex
  • Each source needs a custom extractor due to unique formatting (Roman numerals, ALL CAPS titles, numbered entries, etc.)
  • CONTENTS sections often duplicate the same headings as actual content — need to find the 2nd occurrence or verify context
  • Poe is the most complex: uses section tracking (poem_sections vs non_poem_sections) to only extract from the 4 actual poetry sections, skipping memoir, notes, prose, essays

Letter Sources (11)

Source Gutenberg ID Extractor
Henry VIII to Anne Boleyn 22009 extract_henry_viii
Mary Wollstonecraft to Gilbert Imlay 3529 extract_wollstonecraft
Abelard & Héloïse 35977 extract_abelard_heloise
Napoleon to Josephine 37499 extract_napoleon
Keats to Fanny Brawne 35698 extract_keats_brawne
Browning Letters Vol 1 50400 _extract_browning
Browning Letters Vol 2 51263 _extract_browning
Burns to Clarinda 6131 extract_burns_clarinda
Dorothy Osborne 34387 extract_dorothy_osborne
Beethoven 13065 extract_beethoven
Mozart 5307 extract_mozart

Poetry Sources (15)

Source Gutenberg ID Extractor
Shakespeare Sonnets 1041 extract_shakespeare_sonnets
Emily Dickinson 12242 extract_dickinson
Walt Whitman 1322 extract_whitman
William Blake 1934 extract_blake
John Keats 23684 extract_keats
Edgar Allan Poe 10031 extract_poe
E.B. Browning Sonnets 2002 extract_browning_sonnets
T.S. Eliot 1321 extract_eliot_wasteland
Robert Frost (Mountain) 29345 extract_frost_mountain
Robert Frost (Selected) 59824 extract_frost_selected
W.B. Yeats 32233 extract_yeats
Omar Khayyám 246 extract_khayyam
Robert Burns 1279 extract_burns
William Wordsworth 9622 extract_wordsworth
Percy Shelley 4800 extract_shelley

Web UI Architecture

Both /letters and /poetry pages are standalone SPAs (no Jekyll dependency). They:

  • Load a single combined JSON file via fetch()
  • Match the site's dark neumorphism theme (bg #2b2d2f, text #fff)
  • Letters uses red accent (#ff073a), Poetry uses purple accent (#c084fc)
  • Feature: author/poet sidebar, card grid, random button, detail view with font controls
  • Keyboard nav: ←/→ arrows, Escape to go back, R for random
  • Font auto-fit: calculates ideal font size from container width and longest line length
  • Manual A+/A buttons override auto; click "auto" label to reset

Change Log

Fix Poe parser and add font size controls

  • Rewrote extract_poe() with section tracking (poem_sections vs non_poem_sections)
  • Only extracts from 4 poetry sections, skips memoir/notes/prose/essays/dedications
  • Result: 51 clean poems (was 108 with junk)
  • Added _is_title(), _save_current(), _norm() helper functions
  • Added skip_titles set for sub-headings (PREFACE, dedications, etc.)
  • Renames "Part I/II" → "Al Aaraaf — Part I/II"

Add poetry collection

  • Created download_poetry.py with 15 Gutenberg extractors
  • 3,098 poems from 15 sources stored in poetry/ as JSON
  • Created /poetry web page matching site theme

Remove letter truncation

  • Removed truncate_letter() — shows full letter text

Restructure: pre-downloaded letters

  • Moved from download-on-run to pre-parsed JSON in letters/
  • Added 6 new sources (Browning, Burns, Osborne, Beethoven, Mozart)
  • 1,307 letters from 11 sources

Initial love letters app

  • Created love_letters.py CLI with 5 initial sources
  • download_letters.py for fetching/parsing Gutenberg texts