Define the canonical post-body contract for issue #122 so new authoring and archive cleanup both target one stable shape.
The repository should keep _posts/ as the canonical source of truth, including the real filename and YAML front matter, while making the body itself as portable and markdown-first as possible.
The current repository already has partial policy spread across multiple places:
scripts/normalize_post_media_links.py rewrites some standalone URLsscripts/validate_posts.py enforces some body-shape rules<img>, <figure>, <blockquote>, <br>, styled wrappers, and provider-specific embed wrappersWithout one written contract, the repository risks growing two different standards:
That split would turn future cleanup into taste debates instead of mechanical normalization.
The canonical contract is:
_posts/ remains canonical, including filename and front matterUse plain markdown.
No raw HTML should be used for spacing, typography, or simple layout control.
That means the canonical contract rejects:
<p> for ordinary prose<div> wrappers used only for layout<span> used only for styling<br> or used only to create vertical rhythmThese are portability and cleanup liabilities, not authored signal.
Use markdown blockquotes only.
Canonical shape:
> quoted text
>
> second paragraph
The canonical contract does not allow raw <blockquote> for ordinary quote rendering and does not rely on special blockquote classes.
Use markdown image syntax for normal post images.
Canonical shape:

Optional caption shape:

This is the caption.
Rules:
<img> for ordinary images<figure><figcaption>This keeps the body portable and makes archive cleanup mechanical.
If a future need appears that markdown truly cannot express, it should be treated as a new policy decision rather than left implicit.
These standalone URL shapes are valid author input when they appear on their own line:
These URLs should be treated as author-friendly input syntax, not as the final committed canonical body shape.
Before commit, a supported standalone embed URL should be replaced by:
The original raw URL line should not remain in the committed body once normalization succeeds.
Canonical pattern:
<iframe ...></iframe>
*Source:* [YouTube](https://example.com)
The exact embed HTML varies by provider, but the body-level contract is stable:
Use fixed platform-name labels in v1.
Canonical labels:
YouTubeXMastodonBlueskyThreadsCanonical source shape:
*Source:* [YouTube](https://...)
Do not fetch post titles or remote metadata for the label in this issue. Deterministic platform labels are simpler and more stable.
Unsupported standalone bare URLs are tolerated for now and remain untouched by the contract in issue #122.
This issue does not decide whether unsupported standalone URLs should:
That policy belongs to follow-up cleanup work, especially archive cleanup issue #124.
Representative current examples:
Issue #122 allows raw HTML only in these cases:
Issue #122 does not allow raw HTML for:
This means most existing <img>, <figure>, <figcaption>, <blockquote>, <br>, and styled container markup is cleanup debt, not part of the new canonical standard.
Ordinary image or figure markup that is likely portable as markdown:
Raw blockquote and spacing markup that should become markdown:
Supported embed HTML already present in the repo:
#122Define the contract only:
This contract implies:
scripts/normalize_post_media_links.py should become the normalization surface for supported standalone embed URLsscripts/validate_posts.py should validate against the canonical shapes defined here