@gurupanguji

wp:quote Safe Shape Paragraph Breaks Design

Goal

Extend the wp:quote normalizer so one more historical quote shape counts as safe:

a normal wp:quote block
a  paragraph with optional attributes such as id
text content that uses   to represent paragraph breaks inside the quote
an empty  paragraph used only for spacing
a paragraph whose only content is a single anchor tag

The immediate target is the third wp:quote block in _posts/2025-07-21-negativity-blackholes-unburdening-and-resilience.md, which currently causes the whole file to be skipped.

Problem

The current normalizer is intentionally conservative. It converts only a narrow set of wp:quote shapes and skips the whole file if any quote block is unsupported.

For _posts/2025-07-21-negativity-blackholes-unburdening-and-resilience.md, the first two quote blocks are already safe. The third one is skipped because it includes:

a paragraph with an id attribute
a double line break encoded as  
a later separate link paragraph in the same quote block that does not match the existing cite rule

The user has now specified the full desired mechanical conversion for this block. This follow-up should therefore cover:

quoted text paragraphs with  
empty spacing paragraphs
link-only paragraphs that map cleanly to a quoted markdown link

Safe Shape Rule

Treat the following as a safe quote paragraph:

<!-- wp:quote -->
<blockquote class="wp-block-quote"><!-- wp:paragraph -->
<p id="...">Paragraph one.<br /><br />Paragraph two.</p>
<!-- /wp:paragraph -->

Expected markdown output:

> Paragraph one.
>
> Paragraph two.

Important constraints:

the  tag may contain attributes
  and   are allowed only as line-break markers inside otherwise plain text
  should become a blank markdown quote line between quoted paragraphs
an empty  should become a blank markdown quote line
a paragraph whose only content is one anchor should become a quoted markdown link line
no nested block elements
no figures, images, iframes, scripts, or arbitrary rich HTML inside the paragraph

File-Level Behavior

Keep the normalizer conservative at the file level.

Rules:

if every wp:quote block in a file is safe under the expanded rules, convert them all
if any wp:quote block in that file is still unsupported, skip the whole file
keep printing the exact file path and exact skip reason
prefer specific skip reasons over the generic unsupported inner markup when practical

That means this change may or may not make _posts/2025-07-21-negativity-blackholes-unburdening-and-resilience.md fully convertible today. But after the change, the remaining blocker list should be more precise, which is the main review surface the user asked for.

Design

1. Accept paragraph tags with attributes

The current paragraph matcher should accept  tags that carry harmless attributes such as id, while still rejecting nested HTML content inside the paragraph body.

2. Preserve paragraph semantics from ` `

When a paragraph body contains  , convert it into:

one quoted text paragraph
one blank markdown quote line
the next quoted text paragraph

Single   should continue to behave as an inline line break inside the quote content. The main new behavior here is preserving the double break as a real paragraph separation in markdown.

3. Treat empty paragraphs as structural spacing

If a quote block contains:

<p></p>

convert it to:

This keeps the visual paragraph break without treating the quote as malformed.

4. Treat link-only paragraphs as quoted markdown links

If a quote paragraph contains exactly one anchor and no other inner markup, convert it to:

> [Link text](url)

This stays mechanical and safe because it preserves the same content with no inferred prose.

5. Improve skip clarity where it is cheap

If the parser can cheaply distinguish:

empty paragraph
unsupported link-only paragraph
unsupported rich inner markup

it should report that more specific reason. But correctness matters more than taxonomy polish. The tool should not be expanded into a broad HTML classifier.

Tests

Add focused tests for:

a wp:quote paragraph with   converting into two markdown quote paragraphs
the exact third-block text shape from _posts/2025-07-21-negativity-blackholes-unburdening-and-resilience.md, including the id attribute
an empty paragraph converting to a blank > line
a link-only paragraph converting to a quoted markdown link
a regression showing the whole file still skips if another quote block in that file remains unsupported for reasons outside these new safe shapes
dry-run output after the change so the skip list remains visible

Verification

Run:

python3 -m unittest tests/test_normalize_wp_quotes.py
python3 scripts/normalize_wp_quotes.py

Then inspect whether:

the new safe-shape tests pass
the skip reasons remain visible
_posts/2025-07-21-negativity-blackholes-unburdening-and-resilience.md either converts cleanly or now reports a more precise remaining blocker

Acceptance Criteria

The normalizer accepts wp:quote paragraphs with harmless  attributes.
  inside a safe paragraph becomes two markdown quote paragraphs separated by a blank > line.
Empty quote paragraphs become a blank > line.
Link-only quote paragraphs become quoted markdown links.
Existing safe quote shapes continue to pass unchanged.
Files with any remaining unsupported wp:quote block still skip at the file level.
Dry-run output still makes the remaining skipped files and reasons visible.