You cannot select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
6.5 KiB
6.5 KiB
CLAUDE.md — Letters & Poetry Project
Project Overview
A Python project that downloads, parses, and displays historic love letters and classic poetry from Project Gutenberg. Data is pre-parsed and committed to git so end users don't need to download anything. A web UI is served via the hicalsoft.github.io GitHub Pages site.
Repository Structure
├── download_letters.py # Downloads & parses 11 letter sources from Gutenberg
├── download_poetry.py # Downloads & parses 15 poetry sources from Gutenberg
├── love_letters.py # CLI app: displays random letters in the terminal
├── letters/ # Pre-parsed letter JSON files (11 sources, ~1,307 letters)
├── poetry/ # Pre-parsed poetry JSON files (15 sources, ~3,098 poems)
├── hicalsoft.github.io/ # Embedded repo for GitHub Pages web UI (separate git history)
│ ├── letters/index.html # Standalone SPA for browsing letters
│ ├── letters/data/letters.json
│ ├── poetry/index.html # Standalone SPA for browsing poetry
│ └── poetry/data/poetry.json
└── README.md
Key Commands
# Download all letter sources (requires internet)
python3 download_letters.py
# Download all poetry sources (requires internet)
python3 download_poetry.py
# Run CLI app (no internet needed — reads from letters/ directory)
python3 love_letters.py # Random letter
python3 love_letters.py --list # List sources
python3 love_letters.py --source napoleon # Filter by source
Data Pipeline
- Download scripts fetch raw
.txtfiles from Project Gutenberg viaurllib - Each source has a custom extractor function that parses the Gutenberg text format
- Parsed data is saved as JSON to
letters/orpoetry/directories - A separate step combines the individual JSON files into
letters.json/poetry.jsonfor the web UI (these live inhicalsoft.github.io/*/data/)
Regenerating web UI data
# Letters — run from project root
import json, os, glob
out = {"authors": {}, "letters": []}
for f in sorted(glob.glob("letters/*.json")):
data = json.load(open(f))
for l in data:
out["letters"].append({"a": l["author"], "r": l["recipient"], "h": l.get("heading",""), "b": l["body"], "s": l["source"], "p": l.get("period","")})
out["authors"].setdefault(l["author"], 0)
out["authors"][l["author"]] += 1
json.dump(out, open("hicalsoft.github.io/letters/data/letters.json","w"), separators=(",",":"))
# Poetry — same pattern with {authors, poems} structure
Gutenberg Parsing Notes
- Line endings: Always normalize with
.replace("\r\n", "\n").replace("\r", "\n")before regex splitting - START/END markers vary:
"*** START OF THE PROJECT GUTENBERG EBOOK","*** START OF THIS PROJECT GUTENBERG EBOOK", etc. — use regex - Each source needs a custom extractor due to unique formatting (Roman numerals, ALL CAPS titles, numbered entries, etc.)
- CONTENTS sections often duplicate the same headings as actual content — need to find the 2nd occurrence or verify context
- Poe is the most complex: uses section tracking (poem_sections vs non_poem_sections) to only extract from the 4 actual poetry sections, skipping memoir, notes, prose, essays
Letter Sources (11)
| Source | Gutenberg ID | Extractor |
|---|---|---|
| Henry VIII to Anne Boleyn | 22009 | extract_henry_viii |
| Mary Wollstonecraft to Gilbert Imlay | 3529 | extract_wollstonecraft |
| Abelard & Héloïse | 35977 | extract_abelard_heloise |
| Napoleon to Josephine | 37499 | extract_napoleon |
| Keats to Fanny Brawne | 35698 | extract_keats_brawne |
| Browning Letters Vol 1 | 50400 | _extract_browning |
| Browning Letters Vol 2 | 51263 | _extract_browning |
| Burns to Clarinda | 6131 | extract_burns_clarinda |
| Dorothy Osborne | 34387 | extract_dorothy_osborne |
| Beethoven | 13065 | extract_beethoven |
| Mozart | 5307 | extract_mozart |
Poetry Sources (15)
| Source | Gutenberg ID | Extractor |
|---|---|---|
| Shakespeare Sonnets | 1041 | extract_shakespeare_sonnets |
| Emily Dickinson | 12242 | extract_dickinson |
| Walt Whitman | 1322 | extract_whitman |
| William Blake | 1934 | extract_blake |
| John Keats | 23684 | extract_keats |
| Edgar Allan Poe | 10031 | extract_poe |
| E.B. Browning Sonnets | 2002 | extract_browning_sonnets |
| T.S. Eliot | 1321 | extract_eliot_wasteland |
| Robert Frost (Mountain) | 29345 | extract_frost_mountain |
| Robert Frost (Selected) | 59824 | extract_frost_selected |
| W.B. Yeats | 32233 | extract_yeats |
| Omar Khayyám | 246 | extract_khayyam |
| Robert Burns | 1279 | extract_burns |
| William Wordsworth | 9622 | extract_wordsworth |
| Percy Shelley | 4800 | extract_shelley |
Web UI Architecture
Both /letters and /poetry pages are standalone SPAs (no Jekyll dependency). They:
- Load a single combined JSON file via
fetch() - Match the site's dark neumorphism theme (bg
#2b2d2f, text#fff) - Letters uses red accent (
#ff073a), Poetry uses purple accent (#c084fc) - Feature: author/poet sidebar, card grid, random button, detail view with font controls
- Keyboard nav: ←/→ arrows, Escape to go back, R for random
- Font auto-fit: calculates ideal font size from container width and longest line length
- Manual A+/A− buttons override auto; click "auto" label to reset
Change Log
Fix Poe parser and add font size controls
- Rewrote
extract_poe()with section tracking (poem_sections vs non_poem_sections) - Only extracts from 4 poetry sections, skips memoir/notes/prose/essays/dedications
- Result: 51 clean poems (was 108 with junk)
- Added
_is_title(),_save_current(),_norm()helper functions - Added skip_titles set for sub-headings (PREFACE, dedications, etc.)
- Renames "Part I/II" → "Al Aaraaf — Part I/II"
Add poetry collection
- Created
download_poetry.pywith 15 Gutenberg extractors - 3,098 poems from 15 sources stored in
poetry/as JSON - Created
/poetryweb page matching site theme
Remove letter truncation
- Removed
truncate_letter()— shows full letter text
Restructure: pre-downloaded letters
- Moved from download-on-run to pre-parsed JSON in
letters/ - Added 6 new sources (Browning, Burns, Osborne, Beethoven, Mozart)
- 1,307 letters from 11 sources
Initial love letters app
- Created
love_letters.pyCLI with 5 initial sources download_letters.pyfor fetching/parsing Gutenberg texts