@gurupanguji

Auto-Convert Media Links Design

Goal

Normalize supported bare media links in new or edited blog posts close to authoring time so the repository stops reintroducing cleanup debt.

Scope

Version 1 targets _posts/*.md only and rewrites markdown body content only.

Convert standalone bare YouTube URLs to the repository’s canonical iframe embed shape.
Convert standalone bare Twitter/X status URLs to the preferred embed HTML using the Twitter oEmbed endpoint.
Wrap standalone non-media bare URLs in markdown link syntax.
Ignore inline prose URLs in v1.
Ignore URLs inside YAML front matter, fenced code blocks, inline code, markdown links, and raw HTML blocks or attributes.

Existing Building Blocks

scripts/normalize_wp_embeds.py already knows the canonical YouTube iframe shape and the Twitter/X oEmbed fetch path.
scripts/validate_posts.py is the shared enforcement surface used by local hooks and CI.
hooks/pre-commit is the tracked hook template installed into .git/hooks by scripts/install_git_hooks.sh.

The new work should reuse these paths instead of introducing a parallel policy surface.

Proposed Architecture

Add a new authoring-time transformer at scripts/normalize_post_media_links.py.

Responsibilities:

inspect explicit post paths or the staged _posts/*.md set
parse post body while preserving front matter and untouched prose
rewrite only deterministic standalone link lines
write changes in place when requested
report what changed and what was skipped

Extend scripts/validate_posts.py with a narrow detection pass that reports leftover standalone supported media links and leftover standalone naked URLs that should have been wrapped. This keeps CI and pre-push aligned with the local transformer.

Update hooks/pre-commit to:

collect staged post paths
run the new transformer in write mode on those paths
re-stage any modified post files
continue with snippet generation and validation

Rewrite Rules

Standalone YouTube URLs

Accepted inputs:

https://youtu.be/<id>
https://www.youtube.com/watch?v=<id>
existing query parameters such as si

Rewrite to the same stable iframe shape already emitted by scripts/normalize_wp_embeds.py.

Standalone Twitter/X Status URLs

Accepted inputs:

https://twitter.com/<user>/status/<id>
https://x.com/<user>/status/<id>

Rewrite by fetching oEmbed HTML from publish.twitter.com/oembed with dnt=1 and hide_thread=true.

If the fetch fails, leave the line untouched and let validation fail with a clear error. The failure should name the file and line so the author can retry or adjust the post.

Standalone Generic URLs

If a line consists of a bare non-media URL with optional surrounding whitespace, rewrite it to:

[url](url)

Do not attempt prose-aware inline rewrites in v1.

Detection Boundaries

The transformer and validator should both use the same body scanner rules:

skip fenced code blocks
skip inline code spans
skip existing markdown links
skip raw HTML blocks and raw HTML tag lines

The intent is mechanical normalization, not markdown authorship assistance.

Test Plan

Add tests/test_normalize_post_media_links.py covering:

standalone YouTube URL converts to iframe
standalone Twitter/X status URL converts using mocked oEmbed HTML
standalone generic URL wraps as markdown link
inline prose URL remains unchanged
code fence, inline code, markdown link, and HTML contexts remain unchanged
Twitter/X fetch failure leaves text unchanged and reports a reason

Extend tests/test_validate_posts.py to cover:

leftover standalone YouTube URL fails validation
leftover standalone Twitter/X status URL fails validation
leftover standalone generic bare URL on its own line fails validation
converted iframe and converted markdown link pass

Roll out the validator rule only for posts whose filename date is on or after the policy start date. This keeps the historical archive from failing CI while still enforcing the rule for new authoring.

Risks

Commit-time network dependency

Twitter/X conversion now happens in the default pre-commit path. If the oEmbed endpoint is slow or unavailable, commits that stage Twitter/X links will fail validation until the fetch succeeds.

This is acceptable because the user explicitly prefers convenience over strict offline safety for this workflow.

False positives

Line-oriented detection can overreach if it scans inside code or HTML. The scanner must stay narrow and be covered by tests for those contexts.

Decision

Solve issue #105 with one versioned transformer plus one shared validator path. Do not bury logic directly in the git hook and do not rely on manual cleanup after the post is already written.