@gurupanguji

Source Citation Table Bug Design

Problem

Issue #115 remained broken after the first citation cleanup.

The current markdown-safe source shape:

[Source: Story Title | Publication](https://example.com/story)

is not actually safe in this repository’s markdown pipeline. On live pages, Kramdown is treating the | inside the link label as a table separator and rendering the line as a two-cell table instead of a link.

This affects at least the two reported posts and a wider set of archive entries that use the same [Source: ... | Publication](...) pattern.

Goals

  1. Fix the two reported posts on the live site.
  2. Bulk-clean the archive entries that use the same risky source-line shape.
  3. Keep posts markdown-only and portable.
  4. Add validation so future posts cannot reintroduce the table-parsing pattern.

Non-Goals

  1. Do not move source attribution back into custom HTML.
  2. Do not restyle source lines in this pass.
  3. Do not rewrite unrelated quote or embed markup.

Constraints

  1. The fix must stay markdown-first.
  2. The resulting source line must render correctly even when adjacent to WordPress-style HTML comments and blockquotes.
  3. The rewrite should be mechanical and reviewable across the archive.

Candidate Approaches

1. Keep the current bracketed source line and escape the pipe

Example:

[Source: Story Title \| Publication](https://example.com/story)

Pros:

Cons:

Example:

[Source: Story Title](https://example.com/story)

Pros:

Cons:

Example:

*Source:* [Story Title](https://example.com/story)

Pros:

Cons:

Recommendation

Use approach 3.

Canonical source line after a quote:

*Source:* [Story Title](https://example.com/story)

If the old source label used Title | Publication, split it during normalization:

This is the blunt option, but it is legible, parser-safe, and portable.

Implementation Design

Content normalization

Update scripts/normalize_source_citations.py so it converts both of these risky shapes:

Source: [Story Title](https://example.com/story)
[Source: Story Title | Publication](https://example.com/story)

into:

*Source:* [Story Title](https://example.com/story)

The normalizer should also convert existing safe bracketed source lines without pipes into the same canonical emphasis form so the archive does not end up with mixed patterns.

Validation

Update scripts/validate_posts.py to reject:

  1. raw Source: [..](..) immediately after a blockquote
  2. bracketed [Source: ...](...) lines that still include |
  3. temporary gp-quote HTML if any remains

The validator should allow:

*Source:* [Story Title](https://example.com/story)

Scope

Bulk-update the affected archive set identified by the normalizer. The first pass already touched the posts most likely to contain these source lines, but the new scan should be run against the whole archive because some risky lines exist outside the first enforcement window.

Verification

  1. Unit tests for normalizing pipe-bearing source labels.
  2. Unit tests for validator rejection of bracketed source lines with |.
  3. Run python3 scripts/validate_posts.py --today 2026-03-27.
  4. Spot-check the two broken live posts plus a few other archive examples that currently contain | in the source label.

Acceptance Criteria

  1. The two reported posts no longer show raw markdown or tables for source lines.
  2. All affected archive posts use the same markdown-only canonical source shape.
  3. Validation fails if a future post uses the risky pipe-bearing bracketed source line again.