For agentic workers: REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (
- [ ]) syntax for tracking.
Goal: Reduce issue #104 markup debt by expanding safe quote normalization and adding a dedicated safe embed normalizer for YouTube and Twitter/X, while explicitly skipping risky historical markup.
Architecture: Keep quote normalization and embed normalization as separate tools. Extend the quote normalizer to convert multiple safe quote blocks and simple raw HTML blockquotes. Add a new embed normalizer that handles YouTube locally and Twitter/X through a migration-time oEmbed helper. Drive both tools with tests first, use dry-run reporting before any rewrite, then apply only the safe batch and verify the results.
Tech Stack: Python 3, unittest, regex/string parsing, Jekyll markdown content, live HTTP fetch for X oEmbed during write-mode migration, existing repository validators
Files:
tests/test_normalize_wp_quotes.pytests/test_normalize_wp_embeds.pyOptional create: focused text fixtures under tests/fixtures/
wp:quote blocksCreate at least one fixture or inline test case with two or more independent safe quote blocks in one post and assert they all convert cleanly.
Cover at least:
<blockquote>plain text</blockquote><blockquote><p>...</p><p>...</p></blockquote>cite-bearing <blockquote>...<cite><a ...>...</a></cite></blockquote>
Cover:
helper failure path for Twitter/X
Run:
python3 -m unittest tests/test_normalize_wp_quotes.py tests/test_normalize_wp_embeds.py
Expected: FAIL for the newly added behaviors before implementation.
Files:
scripts/normalize_wp_quotes.pyModify: tests/test_normalize_wp_quotes.py
Replace the current whole-post skip for multiple blocks with per-block conversion when each block is individually safe.
Implement a second parser path for safe raw HTML blockquotes that map mechanically to markdown quote lines plus optional Source: output.
Preserve hard skips for:
unsupported cite shapes
Make sure the reporting still shows which files would convert and which files are skipped, now with the finer-grained quote support.
Run:
python3 -m unittest tests/test_normalize_wp_quotes.py
Expected: PASS
Files:
scripts/normalize_wp_embeds.pyCreate: tests/test_normalize_wp_embeds.py
Recognize:
<!-- wp:embed ... --><!-- wp:core-embed/... -->wrapper body URL extraction
Support the archive’s common YouTube URL forms and render one stable iframe shape.
Add a small helper that:
Keep this boundary easy to mock in tests.
For every non-Twitter, non-YouTube provider, report the provider name and skip without rewriting the post.
Run:
python3 -m unittest tests/test_normalize_wp_embeds.py
Expected: PASS
Files:
scripts/normalize_wp_quotes.pyscripts/normalize_wp_embeds.pyModify: related tests
Assert in tests that no file changes occur without explicit write mode.
Allow one file or a short file list to be processed for local spot checks.
If a post contains a mix of supported and unsupported embed blocks, decide at the post level whether partial conversion is acceptable. My recommendation is yes for independent safe blocks, but only if skipped blocks remain untouched and reporting is explicit.
Cover:
Files:
_posts/*.mdVerify: normalization scripts
Run:
python3 scripts/normalize_wp_quotes.py
python3 scripts/normalize_wp_embeds.py
Expected:
no file mutations
Check that the backlog falls into the intended v1 buckets:
skipped long-tail providers
Open at least:
Files:
_posts/Verify: both normalization scripts
Run:
python3 scripts/normalize_wp_quotes.py --write
Run:
python3 scripts/normalize_wp_embeds.py --write
Expected:
reporting clearly separates converted and skipped files
Run:
git diff --stat
git diff -- _posts
Verify that the content changes stay mechanical and match the planned output shapes.
Files:
tests/scripts/validate_posts.pyVerify: HTML validation tooling
Run:
python3 -m unittest tests/test_normalize_wp_quotes.py tests/test_normalize_wp_embeds.py
Expected: PASS
Run:
python3 scripts/validate_posts.py --today "$(date +%F)"
Expected: PASS
Expected: PASS
Inspect a few converted archive posts in bundle exec jekyll serve to confirm:
Files:
Verify: converted posts
Call out:
Twitter/X helper failures if any
If the rewrite batch is too noisy, split the work into:
archive content rewrite
After this pass, the next issue should target only one leftover class at a time instead of reopening “clean up HTML” in the abstract.