@gurupanguji

Normalize Legacy wp:quote Blocks Design

Goal

Normalize legacy wp:quote blocks across all posts in _posts/ where a clean mechanical conversion exists, replacing them with native markdown blockquotes plus a separate Source: line when citation data is present.

Problem

Issue #100 follows the weekly roundup work from #99.

The repository still contains many historical posts with legacy WordPress quote markup:


<blockquote class="wp-block-quote">
optional nested  wrappers
optional <cite><a ...>...</a></cite> attribution

This older markup makes the content harder to normalize, harder to parse, and less aligned with the repository’s Jekyll-native markup direction.

The goal here is not a full HTML migration project. The goal is a safe batch conversion of straightforward quote blocks only.

Design

1. Scope the cleanup across all posts, not only `🔗` posts

The cleanup pass should scan all markdown posts in _posts/, not just 🔗 link posts.

Reason:

legacy wp:quote exists outside the link-post subset
a repository-wide safe normalization pass reduces future parser debt more effectively
the conversion rule is markup-based, not category-based

2. Add a narrow normalization script

Add a script, likely scripts/normalize_wp_quotes.py.

Responsibilities:

scan posts for 
identify safe-shape quote blocks
convert only safe candidates
print a summary of:
- converted files
- skipped files
- skip reasons

The script should default to dry-run mode and require an explicit --write flag to edit files in place.

3. Convert only safe-shape quote blocks

A quote block is a safe candidate only when:

it is a straightforward wp:quote block
its content is text paragraphs that map cleanly to markdown blockquote lines
any citation is a simple <cite><a ...>Title</a></cite> form that can be rendered cleanly
the surrounding markup before and after the block can remain untouched

Converted output shape:

quoted text becomes markdown blockquote lines
citation moves out of the quote and becomes a separate line:
- Source: [Title](url)

The conversion should preserve meaning and attribution, but it should not preserve WordPress block wrappers.

4. Preserve surrounding content without style rewrites

The script should touch only the targeted quote block.

It should preserve:

front matter
dates
categories and tags
authored commentary
images
embeds
other non-quote block content

It should not:

rewrite prose
normalize unrelated legacy WordPress blocks
alter spacing beyond what is needed to make the converted markdown readable

5. Skip ambiguous or risky quote structures

The script should skip, not guess, when it sees:

nested wp:quote blocks
repeated wp:quote blocks in the same post
quote blocks containing images or non-paragraph inner structures
malformed HTML or mismatched block markers
citation structures that cannot be rendered cleanly as Source: [Title](url)

Every skip should include a reason so the remaining debt is visible and reviewable.

6. Keep the conversion mechanical and reviewable

This issue is a normalization pass, not an editorial pass.

That means:

convert safe markup mechanically
do not rewrite sentence flow
do not remove unusual diction
do not “improve” the writing while touching the file

This keeps the batch diff reviewable and preserves authored signal.

File Changes

New

docs/superpowers/specs/2026-03-26-normalize-wp-quotes-design.md
scripts/normalize_wp_quotes.py
test coverage for the normalization script

Modify

safe candidate posts in _posts/ that can be converted cleanly

Verification

Automated

Add unit coverage for:

safe quote extraction from wp:quote
paragraph-to-markdown blockquote conversion
cite-to-Source: conversion
dry-run reporting
skip behavior for:
- nested quotes
- repeated quotes in one post
- quote blocks containing images
- malformed quote blocks

Manual

After the script runs with --write:

inspect a representative sample of converted posts
verify that surrounding content stayed intact
verify that Source: lines render cleanly

Repository Checks

Run the normal repository validators after conversion so the cleanup does not introduce date or snippet regressions.

Risks

Historical markup varies more than expected

Some old WordPress exports may look similar while carrying different internal structure. The safe response is to skip those files, not to widen the converter until it starts guessing.

Broad batch diffs can hide mistakes

Because this cleanup can touch many posts, the conversion must stay mechanical and the skip report must be explicit so review stays tractable.

Markdown conversion can subtly change spacing

Even when meaning is preserved, a broad conversion can alter blank-line spacing. The script should aim for minimal readable markdown and avoid unrelated layout churn.

Acceptance Criteria

A new normalization script scans posts containing legacy wp:quote markup.
The script defaults to dry-run and requires --write to edit files.
Safe quote blocks are converted to native markdown blockquotes.
Citation links are moved below the quote as Source: [Title](url).
Ambiguous quote structures are skipped with explicit reasons.
Surrounding non-quote content remains unchanged aside from minimal spacing needed for readable markdown.
Unit tests cover both successful conversions and skip cases.
Repository validation passes after the converted batch is written.