@gurupanguji

wp:quote Safe Shape Paragraph Breaks Design

Goal

Extend the wp:quote normalizer so one more historical quote shape counts as safe:

The immediate target is the third wp:quote block in _posts/2025-07-21-negativity-blackholes-unburdening-and-resilience.md, which currently causes the whole file to be skipped.

Problem

The current normalizer is intentionally conservative. It converts only a narrow set of wp:quote shapes and skips the whole file if any quote block is unsupported.

For _posts/2025-07-21-negativity-blackholes-unburdening-and-resilience.md, the first two quote blocks are already safe. The third one is skipped because it includes:

The user has now specified the full desired mechanical conversion for this block. This follow-up should therefore cover:

Safe Shape Rule

Treat the following as a safe quote paragraph:

<!-- wp:quote -->
<blockquote class="wp-block-quote"><!-- wp:paragraph -->
<p id="...">Paragraph one.<br /><br />Paragraph two.</p>
<!-- /wp:paragraph -->

Expected markdown output:

> Paragraph one.
>
> Paragraph two.

Important constraints:

File-Level Behavior

Keep the normalizer conservative at the file level.

Rules:

That means this change may or may not make _posts/2025-07-21-negativity-blackholes-unburdening-and-resilience.md fully convertible today. But after the change, the remaining blocker list should be more precise, which is the main review surface the user asked for.

Design

1. Accept paragraph tags with attributes

The current paragraph matcher should accept <p> tags that carry harmless attributes such as id, while still rejecting nested HTML content inside the paragraph body.

2. Preserve paragraph semantics from <br /><br />

When a paragraph body contains <br /><br />, convert it into:

Single <br> should continue to behave as an inline line break inside the quote content. The main new behavior here is preserving the double break as a real paragraph separation in markdown.

3. Treat empty paragraphs as structural spacing

If a quote block contains:

<p></p>

convert it to:

>

This keeps the visual paragraph break without treating the quote as malformed.

If a quote paragraph contains exactly one anchor and no other inner markup, convert it to:

> [Link text](url)

This stays mechanical and safe because it preserves the same content with no inferred prose.

5. Improve skip clarity where it is cheap

If the parser can cheaply distinguish:

it should report that more specific reason. But correctness matters more than taxonomy polish. The tool should not be expanded into a broad HTML classifier.

Tests

Add focused tests for:

Verification

Run:

python3 -m unittest tests/test_normalize_wp_quotes.py
python3 scripts/normalize_wp_quotes.py

Then inspect whether:

Acceptance Criteria

  1. The normalizer accepts wp:quote paragraphs with harmless <p> attributes.
  2. <br /><br /> inside a safe paragraph becomes two markdown quote paragraphs separated by a blank > line.
  3. Empty quote paragraphs become a blank > line.
  4. Link-only quote paragraphs become quoted markdown links.
  5. Existing safe quote shapes continue to pass unchanged.
  6. Files with any remaining unsupported wp:quote block still skip at the file level.
  7. Dry-run output still makes the remaining skipped files and reasons visible.