@gurupanguji

Portable Post-Body Contract Design

Goal

Define the canonical post-body contract for issue #122 so new authoring and archive cleanup both target one stable shape.

The repository should keep _posts/ as the canonical source of truth, including the real filename and YAML front matter, while making the body itself as portable and markdown-first as possible.

Problem

The current repository already has partial policy spread across multiple places:

Without one written contract, the repository risks growing two different standards:

  1. one standard for newly-authored posts
  2. another standard for archive cleanup

That split would turn future cleanup into taste debates instead of mechanical normalization.

Decision

The canonical contract is:

Canonical Body Shapes

1. Paragraphs and headings

Use plain markdown.

No raw HTML should be used for spacing, typography, or simple layout control.

That means the canonical contract rejects:

These are portability and cleanup liabilities, not authored signal.

2. Blockquotes

Use markdown blockquotes only.

Canonical shape:

> quoted text
>
> second paragraph

The canonical contract does not allow raw <blockquote> for ordinary quote rendering and does not rely on special blockquote classes.

3. Images

Use markdown image syntax for normal post images.

Canonical shape:

![Alt text](/assets/images/blog/filename.jpg)

Optional caption shape:

![Alt text](/assets/images/blog/filename.jpg)

This is the caption.

Rules:

This keeps the body portable and makes archive cleanup mechanical.

If a future need appears that markdown truly cannot express, it should be treated as a new policy decision rather than left implicit.

4. Supported standalone embed inputs

These standalone URL shapes are valid author input when they appear on their own line:

These URLs should be treated as author-friendly input syntax, not as the final committed canonical body shape.

5. Supported embed output shape

Before commit, a supported standalone embed URL should be replaced by:

  1. canonical embed HTML
  2. a visible markdown source line directly under it

The original raw URL line should not remain in the committed body once normalization succeeds.

Canonical pattern:

<iframe ...></iframe>
*Source:* [YouTube](https://example.com)

The exact embed HTML varies by provider, but the body-level contract is stable:

6. Source-line label rule

Use fixed platform-name labels in v1.

Canonical labels:

Canonical source shape:

*Source:* [YouTube](https://...)

Do not fetch post titles or remote metadata for the label in this issue. Deterministic platform labels are simpler and more stable.

7. Unsupported standalone URLs

Unsupported standalone bare URLs are tolerated for now and remain untouched by the contract in issue #122.

This issue does not decide whether unsupported standalone URLs should:

That policy belongs to follow-up cleanup work, especially archive cleanup issue #124.

Representative current examples:

Allowed HTML

Issue #122 allows raw HTML only in these cases:

Issue #122 does not allow raw HTML for:

This means most existing <img>, <figure>, <figcaption>, <blockquote>, <br>, and styled container markup is cleanup debt, not part of the new canonical standard.

Archive Examples That Inform This Contract

HTML that should move to markdown

Ordinary image or figure markup that is likely portable as markdown:

Raw blockquote and spacing markup that should become markdown:

HTML that remains allowed under this contract

Supported embed HTML already present in the repo:

Responsibilities By Issue

Issue #122

Define the contract only:

Follow-up issues

Implementation Consequences

This contract implies:

Acceptance Criteria

  1. The repository has one written source of truth for canonical post-body shapes.
  2. The contract clearly distinguishes markdown-first content from the small HTML surface that remains allowed.
  3. Supported standalone embed URLs have an explicit input shape and explicit canonical output shape.
  4. The contract preserves a visible markdown source URL under generated embeds.
  5. Unsupported standalone URLs are explicitly left for follow-up work instead of being decided accidentally during implementation.