@gurupanguji

Auto-Convert Media Links Design

Goal

Normalize supported bare media links in new or edited blog posts close to authoring time so the repository stops reintroducing cleanup debt.

Scope

Version 1 targets _posts/*.md only and rewrites markdown body content only.

Existing Building Blocks

The new work should reuse these paths instead of introducing a parallel policy surface.

Proposed Architecture

Add a new authoring-time transformer at scripts/normalize_post_media_links.py.

Responsibilities:

Extend scripts/validate_posts.py with a narrow detection pass that reports leftover standalone supported media links and leftover standalone naked URLs that should have been wrapped. This keeps CI and pre-push aligned with the local transformer.

Update hooks/pre-commit to:

  1. collect staged post paths
  2. run the new transformer in write mode on those paths
  3. re-stage any modified post files
  4. continue with snippet generation and validation

Rewrite Rules

Standalone YouTube URLs

Accepted inputs:

Rewrite to the same stable iframe shape already emitted by scripts/normalize_wp_embeds.py.

Standalone Twitter/X Status URLs

Accepted inputs:

Rewrite by fetching oEmbed HTML from publish.twitter.com/oembed with dnt=1 and hide_thread=true.

If the fetch fails, leave the line untouched and let validation fail with a clear error. The failure should name the file and line so the author can retry or adjust the post.

Standalone Generic URLs

If a line consists of a bare non-media URL with optional surrounding whitespace, rewrite it to:

[url](url)

Do not attempt prose-aware inline rewrites in v1.

Detection Boundaries

The transformer and validator should both use the same body scanner rules:

The intent is mechanical normalization, not markdown authorship assistance.

Test Plan

Add tests/test_normalize_post_media_links.py covering:

  1. standalone YouTube URL converts to iframe
  2. standalone Twitter/X status URL converts using mocked oEmbed HTML
  3. standalone generic URL wraps as markdown link
  4. inline prose URL remains unchanged
  5. code fence, inline code, markdown link, and HTML contexts remain unchanged
  6. Twitter/X fetch failure leaves text unchanged and reports a reason

Extend tests/test_validate_posts.py to cover:

  1. leftover standalone YouTube URL fails validation
  2. leftover standalone Twitter/X status URL fails validation
  3. leftover standalone generic bare URL on its own line fails validation
  4. converted iframe and converted markdown link pass

Roll out the validator rule only for posts whose filename date is on or after the policy start date. This keeps the historical archive from failing CI while still enforcing the rule for new authoring.

Risks

Commit-time network dependency

Twitter/X conversion now happens in the default pre-commit path. If the oEmbed endpoint is slow or unavailable, commits that stage Twitter/X links will fail validation until the fetch succeeds.

This is acceptable because the user explicitly prefers convenience over strict offline safety for this workflow.

False positives

Line-oriented detection can overreach if it scans inside code or HTML. The scanner must stay narrow and be covered by tests for those contexts.

Decision

Solve issue #105 with one versioned transformer plus one shared validator path. Do not bury logic directly in the git hook and do not rely on manual cleanup after the post is already written.