You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

6.5 KiB

Raw Blame History Unescape Escape

CLAUDE.md — Letters & Poetry Project

Project Overview

A Python project that downloads, parses, and displays historic love letters and classic poetry from Project Gutenberg. Data is pre-parsed and committed to git so end users don't need to download anything. A web UI is served via the hicalsoft.github.io GitHub Pages site.

Repository Structure

├── download_letters.py          # Downloads & parses 11 letter sources from Gutenberg
├── download_poetry.py           # Downloads & parses 15 poetry sources from Gutenberg
├── love_letters.py              # CLI app: displays random letters in the terminal
├── letters/                     # Pre-parsed letter JSON files (11 sources, ~1,307 letters)
├── poetry/                      # Pre-parsed poetry JSON files (15 sources, ~3,098 poems)
├── hicalsoft.github.io/         # Embedded repo for GitHub Pages web UI (separate git history)
│   ├── letters/index.html       # Standalone SPA for browsing letters
│   ├── letters/data/letters.json
│   ├── poetry/index.html        # Standalone SPA for browsing poetry
│   └── poetry/data/poetry.json
└── README.md

Key Commands

# Download all letter sources (requires internet)
python3 download_letters.py

# Download all poetry sources (requires internet)
python3 download_poetry.py

# Run CLI app (no internet needed — reads from letters/ directory)
python3 love_letters.py                  # Random letter
python3 love_letters.py --list           # List sources
python3 love_letters.py --source napoleon  # Filter by source

Data Pipeline

Download scripts fetch raw .txt files from Project Gutenberg via urllib
Each source has a custom extractor function that parses the Gutenberg text format
Parsed data is saved as JSON to letters/ or poetry/ directories
A separate step combines the individual JSON files into letters.json / poetry.json for the web UI (these live in hicalsoft.github.io/*/data/)

Regenerating web UI data

# Letters — run from project root
import json, os, glob
out = {"authors": {}, "letters": []}
for f in sorted(glob.glob("letters/*.json")):
    data = json.load(open(f))
    for l in data:
        out["letters"].append({"a": l["author"], "r": l["recipient"], "h": l.get("heading",""), "b": l["body"], "s": l["source"], "p": l.get("period","")})
        out["authors"].setdefault(l["author"], 0)
        out["authors"][l["author"]] += 1
json.dump(out, open("hicalsoft.github.io/letters/data/letters.json","w"), separators=(",",":"))

# Poetry — same pattern with {authors, poems} structure

Gutenberg Parsing Notes

Line endings: Always normalize with .replace("\r\n", "\n").replace("\r", "\n") before regex splitting
START/END markers vary: "*** START OF THE PROJECT GUTENBERG EBOOK", "*** START OF THIS PROJECT GUTENBERG EBOOK", etc. — use regex
Each source needs a custom extractor due to unique formatting (Roman numerals, ALL CAPS titles, numbered entries, etc.)
CONTENTS sections often duplicate the same headings as actual content — need to find the 2nd occurrence or verify context
Poe is the most complex: uses section tracking (poem_sections vs non_poem_sections) to only extract from the 4 actual poetry sections, skipping memoir, notes, prose, essays

Letter Sources (11)

Source	Gutenberg ID	Extractor
Henry VIII to Anne Boleyn	22009	`extract_henry_viii`
Mary Wollstonecraft to Gilbert Imlay	3529	`extract_wollstonecraft`
Abelard & Héloïse	35977	`extract_abelard_heloise`
Napoleon to Josephine	37499	`extract_napoleon`
Keats to Fanny Brawne	35698	`extract_keats_brawne`
Browning Letters Vol 1	50400	`_extract_browning`
Browning Letters Vol 2	51263	`_extract_browning`
Burns to Clarinda	6131	`extract_burns_clarinda`
Dorothy Osborne	34387	`extract_dorothy_osborne`
Beethoven	13065	`extract_beethoven`
Mozart	5307	`extract_mozart`

Poetry Sources (15)

Source	Gutenberg ID	Extractor
Shakespeare Sonnets	1041	`extract_shakespeare_sonnets`
Emily Dickinson	12242	`extract_dickinson`
Walt Whitman	1322	`extract_whitman`
William Blake	1934	`extract_blake`
John Keats	23684	`extract_keats`
Edgar Allan Poe	10031	`extract_poe`
E.B. Browning Sonnets	2002	`extract_browning_sonnets`
T.S. Eliot	1321	`extract_eliot_wasteland`
Robert Frost (Mountain)	29345	`extract_frost_mountain`
Robert Frost (Selected)	59824	`extract_frost_selected`
W.B. Yeats	32233	`extract_yeats`
Omar Khayyám	246	`extract_khayyam`
Robert Burns	1279	`extract_burns`
William Wordsworth	9622	`extract_wordsworth`
Percy Shelley	4800	`extract_shelley`

Web UI Architecture

Both /letters and /poetry pages are standalone SPAs (no Jekyll dependency). They:

Load a single combined JSON file via fetch()
Match the site's dark neumorphism theme (bg #2b2d2f, text #fff)
Letters uses red accent (#ff073a), Poetry uses purple accent (#c084fc)
Feature: author/poet sidebar, card grid, random button, detail view with font controls
Keyboard nav: ←/→ arrows, Escape to go back, R for random
Font auto-fit: calculates ideal font size from container width and longest line length
Manual A+/A− buttons override auto; click "auto" label to reset

Change Log

Fix Poe parser and add font size controls

Rewrote extract_poe() with section tracking (poem_sections vs non_poem_sections)
Only extracts from 4 poetry sections, skips memoir/notes/prose/essays/dedications
Result: 51 clean poems (was 108 with junk)
Added _is_title(), _save_current(), _norm() helper functions
Added skip_titles set for sub-headings (PREFACE, dedications, etc.)
Renames "Part I/II" → "Al Aaraaf — Part I/II"

Add poetry collection

Created download_poetry.py with 15 Gutenberg extractors
3,098 poems from 15 sources stored in poetry/ as JSON
Created /poetry web page matching site theme

Remove letter truncation

Removed truncate_letter() — shows full letter text

Restructure: pre-downloaded letters

Moved from download-on-run to pre-parsed JSON in letters/
Added 6 new sources (Browning, Burns, Osborne, Beethoven, Mozart)
1,307 letters from 11 sources

Initial love letters app

Created love_letters.py CLI with 5 initial sources
download_letters.py for fetching/parsing Gutenberg texts

6.5 KiB Raw Blame History Unescape Escape