@gurupanguji

Normalize Legacy wp:quote Blocks Design

Goal

Normalize legacy wp:quote blocks across all posts in _posts/ where a clean mechanical conversion exists, replacing them with native markdown blockquotes plus a separate Source: line when citation data is present.

Problem

Issue #100 follows the weekly roundup work from #99.

The repository still contains many historical posts with legacy WordPress quote markup:

This older markup makes the content harder to normalize, harder to parse, and less aligned with the repository’s Jekyll-native markup direction.

The goal here is not a full HTML migration project. The goal is a safe batch conversion of straightforward quote blocks only.

Design

1. Scope the cleanup across all posts, not only 🔗 posts

The cleanup pass should scan all markdown posts in _posts/, not just 🔗 link posts.

Reason:

2. Add a narrow normalization script

Add a script, likely scripts/normalize_wp_quotes.py.

Responsibilities:

The script should default to dry-run mode and require an explicit --write flag to edit files in place.

3. Convert only safe-shape quote blocks

A quote block is a safe candidate only when:

Converted output shape:

The conversion should preserve meaning and attribution, but it should not preserve WordPress block wrappers.

4. Preserve surrounding content without style rewrites

The script should touch only the targeted quote block.

It should preserve:

It should not:

5. Skip ambiguous or risky quote structures

The script should skip, not guess, when it sees:

Every skip should include a reason so the remaining debt is visible and reviewable.

6. Keep the conversion mechanical and reviewable

This issue is a normalization pass, not an editorial pass.

That means:

This keeps the batch diff reviewable and preserves authored signal.

File Changes

New

Modify

Verification

Automated

Add unit coverage for:

Manual

After the script runs with --write:

Repository Checks

Run the normal repository validators after conversion so the cleanup does not introduce date or snippet regressions.

Risks

Historical markup varies more than expected

Some old WordPress exports may look similar while carrying different internal structure. The safe response is to skip those files, not to widen the converter until it starts guessing.

Broad batch diffs can hide mistakes

Because this cleanup can touch many posts, the conversion must stay mechanical and the skip report must be explicit so review stays tractable.

Markdown conversion can subtly change spacing

Even when meaning is preserved, a broad conversion can alter blank-line spacing. The script should aim for minimal readable markdown and avoid unrelated layout churn.

Acceptance Criteria

  1. A new normalization script scans posts containing legacy wp:quote markup.
  2. The script defaults to dry-run and requires --write to edit files.
  3. Safe quote blocks are converted to native markdown blockquotes.
  4. Citation links are moved below the quote as Source: [Title](url).
  5. Ambiguous quote structures are skipped with explicit reasons.
  6. Surrounding non-quote content remains unchanged aside from minimal spacing needed for readable markdown.
  7. Unit tests cover both successful conversions and skip cases.
  8. Repository validation passes after the converted batch is written.