diff --git a/CLAUDE.md b/CLAUDE.md new file mode 100644 index 0000000..4853422 --- /dev/null +++ b/CLAUDE.md @@ -0,0 +1,142 @@ +# CLAUDE.md — Letters & Poetry Project + +## Project Overview + +A Python project that downloads, parses, and displays historic love letters and classic poetry from Project Gutenberg. Data is pre-parsed and committed to git so end users don't need to download anything. A web UI is served via the `hicalsoft.github.io` GitHub Pages site. + +## Repository Structure + +``` +├── download_letters.py # Downloads & parses 11 letter sources from Gutenberg +├── download_poetry.py # Downloads & parses 15 poetry sources from Gutenberg +├── love_letters.py # CLI app: displays random letters in the terminal +├── letters/ # Pre-parsed letter JSON files (11 sources, ~1,307 letters) +├── poetry/ # Pre-parsed poetry JSON files (15 sources, ~3,098 poems) +├── hicalsoft.github.io/ # Embedded repo for GitHub Pages web UI (separate git history) +│ ├── letters/index.html # Standalone SPA for browsing letters +│ ├── letters/data/letters.json +│ ├── poetry/index.html # Standalone SPA for browsing poetry +│ └── poetry/data/poetry.json +└── README.md +``` + +## Key Commands + +```bash +# Download all letter sources (requires internet) +python3 download_letters.py + +# Download all poetry sources (requires internet) +python3 download_poetry.py + +# Run CLI app (no internet needed — reads from letters/ directory) +python3 love_letters.py # Random letter +python3 love_letters.py --list # List sources +python3 love_letters.py --source napoleon # Filter by source +``` + +## Data Pipeline + +1. **Download scripts** fetch raw `.txt` files from Project Gutenberg via `urllib` +2. Each source has a **custom extractor function** that parses the Gutenberg text format +3. Parsed data is saved as JSON to `letters/` or `poetry/` directories +4. A separate step combines the individual JSON files into `letters.json` / `poetry.json` for the web UI (these live in `hicalsoft.github.io/*/data/`) + +### Regenerating web UI data + +```python +# Letters — run from project root +import json, os, glob +out = {"authors": {}, "letters": []} +for f in sorted(glob.glob("letters/*.json")): + data = json.load(open(f)) + for l in data: + out["letters"].append({"a": l["author"], "r": l["recipient"], "h": l.get("heading",""), "b": l["body"], "s": l["source"], "p": l.get("period","")}) + out["authors"].setdefault(l["author"], 0) + out["authors"][l["author"]] += 1 +json.dump(out, open("hicalsoft.github.io/letters/data/letters.json","w"), separators=(",",":")) + +# Poetry — same pattern with {authors, poems} structure +``` + +## Gutenberg Parsing Notes + +- **Line endings**: Always normalize with `.replace("\r\n", "\n").replace("\r", "\n")` before regex splitting +- **START/END markers** vary: `"*** START OF THE PROJECT GUTENBERG EBOOK"`, `"*** START OF THIS PROJECT GUTENBERG EBOOK"`, etc. — use regex +- **Each source needs a custom extractor** due to unique formatting (Roman numerals, ALL CAPS titles, numbered entries, etc.) +- **CONTENTS sections** often duplicate the same headings as actual content — need to find the 2nd occurrence or verify context +- **Poe** is the most complex: uses section tracking (poem_sections vs non_poem_sections) to only extract from the 4 actual poetry sections, skipping memoir, notes, prose, essays + +## Letter Sources (11) + +| Source | Gutenberg ID | Extractor | +|--------|-------------|-----------| +| Henry VIII to Anne Boleyn | 22009 | `extract_henry_viii` | +| Mary Wollstonecraft to Gilbert Imlay | 3529 | `extract_wollstonecraft` | +| Abelard & Héloïse | 35977 | `extract_abelard_heloise` | +| Napoleon to Josephine | 37499 | `extract_napoleon` | +| Keats to Fanny Brawne | 35698 | `extract_keats_brawne` | +| Browning Letters Vol 1 | 50400 | `_extract_browning` | +| Browning Letters Vol 2 | 51263 | `_extract_browning` | +| Burns to Clarinda | 6131 | `extract_burns_clarinda` | +| Dorothy Osborne | 34387 | `extract_dorothy_osborne` | +| Beethoven | 13065 | `extract_beethoven` | +| Mozart | 5307 | `extract_mozart` | + +## Poetry Sources (15) + +| Source | Gutenberg ID | Extractor | +|--------|-------------|-----------| +| Shakespeare Sonnets | 1041 | `extract_shakespeare_sonnets` | +| Emily Dickinson | 12242 | `extract_dickinson` | +| Walt Whitman | 1322 | `extract_whitman` | +| William Blake | 1934 | `extract_blake` | +| John Keats | 23684 | `extract_keats` | +| Edgar Allan Poe | 10031 | `extract_poe` | +| E.B. Browning Sonnets | 2002 | `extract_browning_sonnets` | +| T.S. Eliot | 1321 | `extract_eliot_wasteland` | +| Robert Frost (Mountain) | 29345 | `extract_frost_mountain` | +| Robert Frost (Selected) | 59824 | `extract_frost_selected` | +| W.B. Yeats | 32233 | `extract_yeats` | +| Omar Khayyám | 246 | `extract_khayyam` | +| Robert Burns | 1279 | `extract_burns` | +| William Wordsworth | 9622 | `extract_wordsworth` | +| Percy Shelley | 4800 | `extract_shelley` | + +## Web UI Architecture + +Both `/letters` and `/poetry` pages are **standalone SPAs** (no Jekyll dependency). They: +- Load a single combined JSON file via `fetch()` +- Match the site's dark neumorphism theme (bg `#2b2d2f`, text `#fff`) +- Letters uses red accent (`#ff073a`), Poetry uses purple accent (`#c084fc`) +- Feature: author/poet sidebar, card grid, random button, detail view with font controls +- Keyboard nav: ←/→ arrows, Escape to go back, R for random +- Font auto-fit: calculates ideal font size from container width and longest line length +- Manual A+/A− buttons override auto; click "auto" label to reset + +## Change Log + +### Fix Poe parser and add font size controls +- Rewrote `extract_poe()` with section tracking (poem_sections vs non_poem_sections) +- Only extracts from 4 poetry sections, skips memoir/notes/prose/essays/dedications +- Result: 51 clean poems (was 108 with junk) +- Added `_is_title()`, `_save_current()`, `_norm()` helper functions +- Added skip_titles set for sub-headings (PREFACE, dedications, etc.) +- Renames "Part I/II" → "Al Aaraaf — Part I/II" + +### Add poetry collection +- Created `download_poetry.py` with 15 Gutenberg extractors +- 3,098 poems from 15 sources stored in `poetry/` as JSON +- Created `/poetry` web page matching site theme + +### Remove letter truncation +- Removed `truncate_letter()` — shows full letter text + +### Restructure: pre-downloaded letters +- Moved from download-on-run to pre-parsed JSON in `letters/` +- Added 6 new sources (Browning, Burns, Osborne, Beethoven, Mozart) +- 1,307 letters from 11 sources + +### Initial love letters app +- Created `love_letters.py` CLI with 5 initial sources +- `download_letters.py` for fetching/parsing Gutenberg texts diff --git a/hicalsoft.github.io b/hicalsoft.github.io index 7fd16c0..c292f8e 160000 --- a/hicalsoft.github.io +++ b/hicalsoft.github.io @@ -1 +1 @@ -Subproject commit 7fd16c08b20d81da497d2efb44af2e83860382f4 +Subproject commit c292f8eed1677afa7de015a8a32098f7b3f52956