@gurupanguji

Bluesky Link Facets Design

Goal

Fix issue #79 by updating Bluesky publishing so links in _snippets are posted as real Bluesky rich-text links instead of plain text.

The fix should support both:

Scope

Version 1 should stay narrow and reliable.

Non-Goals

Problem Summary

Issue #79 describes a real gap in the current Bluesky path.

Right now publish_bluesky() sends:

client.send_post(text=text, embed=embed)

That means Bluesky receives plain text only. Even when the snippet includes a URL, the publisher does not provide link facets. The result is a weaker Bluesky post shape and inconsistent behavior versus the platform’s rich-text model.

Existing Content Shape

Current Bluesky snippets in _snippets/ mostly end with raw canonical URLs, for example:

... I linked it here: https://gurupanguji.com/blog/2026/03/18/a-website-to-destroy-all-websites/

That means a raw URL parser already solves most existing content.

But the accepted design should also support markdown links in the snippet text. When a snippet contains:

[this essay](https://example.com)

the published Bluesky post should contain:

this essay

with a link facet applied to the visible label span.

Add one shared helper dedicated to Bluesky rich-text preparation.

Input:

Output:

This helper should be used only by the Bluesky publishing path in v1.

Parsing Rules

Pattern:

[label](https://example.com)

Behavior:

Raw URLs

Pattern:

https://example.com/path

Behavior:

Trailing punctuation

For raw URLs, strip trailing punctuation from the facet target and facet span when the punctuation is prose punctuation rather than part of the URL.

Examples to trim:

This matches the guidance shape in Bluesky’s rich-text documentation and avoids bad links caused by sentence punctuation.

Overlaps

Facets must not overlap.

The parser should process markdown links first, because those segments rewrite the visible text. After that, it can process raw URLs in the remaining plain text regions. If overlap is detected anyway, discard the later overlapping facet rather than emitting invalid data.

UTF-8 Indexing Requirement

This is the fragile part.

Bluesky facet ranges are indexed using UTF-8 byte offsets, not Python string indexes and not JavaScript UTF-16 offsets. The helper must calculate:

from the final visible text after markdown link rewriting.

In Python, the safest shape is:

len(text[:index].encode("utf-8"))

for converting character positions in the final visible text into UTF-8 byte offsets.

We should explicitly test this with non-ASCII content before the link span.

Implementation Shape

Keep the code simple.

Recommended structure:

  1. Add a small helper for Bluesky text preparation inside scripts/publish_social.py or extract a small sibling helper module if the file starts getting harder to hold in one pass.
  2. Return a tuple or small object:
    • text
    • facets
  3. Update publish_bluesky() to call the helper before posting.
  4. Send the post with both text and facets.

The helper should be deterministic and free of network calls. That keeps it easy to test.

Testing Strategy

Add focused tests around the helper and the Bluesky publish path.

Minimum cases:

  1. raw URL in snippet text produces one link facet
  2. markdown link rewrites visible text and produces one link facet
  3. mixed raw URL and markdown link produce two non-overlapping facets
  4. trailing punctuation is excluded from the facet span for raw URLs
  5. unicode before the linked span still produces correct UTF-8 byte offsets
  6. existing Bluesky image embed flow still builds a valid post payload

If mocking the entire Bluesky client is awkward, test the helper directly and keep the publish-path test light.

Risks

Wrong byte offsets

This is the main risk. If we use Python character indexes directly, Bluesky may receive malformed facet ranges.

Mitigation:

Parser drift

A homegrown parser can grow into a mess if it starts trying to handle every markdown edge case.

Mitigation:

Bad overlap behavior

If raw URL parsing runs over already rewritten markdown-link output, we could create overlapping or duplicated facets.

Mitigation:

Rollout

  1. add the helper
  2. add tests for raw URLs, markdown links, punctuation, and unicode
  3. update the Bluesky publisher to send facets
  4. run the relevant test suite
  5. dry-run the social publisher if possible
  6. land through branch and PR

Decision

Issue #79 should be solved with one narrow Bluesky text-preparation helper that converts authored snippet text into:

This fixes today’s missing-link problem without turning the social publisher into a general markdown engine.