@gurupanguji

Goal

Fix issue #79 by updating Bluesky publishing so links in _snippets are posted as real Bluesky rich-text links instead of plain text.

The fix should support both:

raw URLs already present in the snippet text
markdown links such as [label](https://example.com)

Scope

Version 1 should stay narrow and reliable.

Update Bluesky publishing only
Parse raw URLs in the outgoing Bluesky snippet text
Parse markdown links in the outgoing Bluesky snippet text
Convert markdown links into plain visible label text with Bluesky link facets
Compute facet ranges using UTF-8 byte offsets, as Bluesky requires
Keep existing image embed behavior unchanged
Add focused tests for the parser and the final publish payload shape

Non-Goals

Do not change snippet text for other platforms
Do not redesign _snippets authoring format
Do not add mention or hashtag facet support in this pass
Do not rewrite the snippet generator unless a test proves it is necessary
Do not broaden this into a general markdown renderer

Problem Summary

Issue #79 describes a real gap in the current Bluesky path.

Right now publish_bluesky() sends:

client.send_post(text=text, embed=embed)

That means Bluesky receives plain text only. Even when the snippet includes a URL, the publisher does not provide link facets. The result is a weaker Bluesky post shape and inconsistent behavior versus the platform’s rich-text model.

Existing Content Shape

Current Bluesky snippets in _snippets/ mostly end with raw canonical URLs, for example:

... I linked it here: https://gurupanguji.com/blog/2026/03/18/a-website-to-destroy-all-websites/

That means a raw URL parser already solves most existing content.

But the accepted design should also support markdown links in the snippet text. When a snippet contains:

[this essay](https://example.com)

the published Bluesky post should contain:

this essay

with a link facet applied to the visible label span.

Recommended Design

Add one shared helper dedicated to Bluesky rich-text preparation.

Input:

snippet text as authored in _snippets

Output:

final plain visible Bluesky text
a list of Bluesky link facets

This helper should be used only by the Bluesky publishing path in v1.

Parsing Rules

Markdown links

Pattern:

[label](https://example.com)

Behavior:

remove the markdown syntax from the visible text
keep only label in the outgoing Bluesky text
create one link facet over the visible label span
use the URL as the facet target

Raw URLs

Pattern:

https://example.com/path

Behavior:

keep the raw URL visible in the outgoing Bluesky text
create one link facet over the URL span

Trailing punctuation

For raw URLs, strip trailing punctuation from the facet target and facet span when the punctuation is prose punctuation rather than part of the URL.

Examples to trim:

.
,
;
!
?
unmatched )

This matches the guidance shape in Bluesky’s rich-text documentation and avoids bad links caused by sentence punctuation.

Overlaps

Facets must not overlap.

The parser should process markdown links first, because those segments rewrite the visible text. After that, it can process raw URLs in the remaining plain text regions. If overlap is detected anyway, discard the later overlapping facet rather than emitting invalid data.

UTF-8 Indexing Requirement

This is the fragile part.

Bluesky facet ranges are indexed using UTF-8 byte offsets, not Python string indexes and not JavaScript UTF-16 offsets. The helper must calculate:

byteStart
byteEnd

from the final visible text after markdown link rewriting.

In Python, the safest shape is:

len(text[:index].encode("utf-8"))

for converting character positions in the final visible text into UTF-8 byte offsets.

We should explicitly test this with non-ASCII content before the link span.

Implementation Shape

Keep the code simple.

Recommended structure:

Add a small helper for Bluesky text preparation inside scripts/publish_social.py or extract a small sibling helper module if the file starts getting harder to hold in one pass.
Return a tuple or small object:
- text
- facets
Update publish_bluesky() to call the helper before posting.
Send the post with both text and facets.

The helper should be deterministic and free of network calls. That keeps it easy to test.

Testing Strategy

Add focused tests around the helper and the Bluesky publish path.

Minimum cases:

raw URL in snippet text produces one link facet
markdown link rewrites visible text and produces one link facet
mixed raw URL and markdown link produce two non-overlapping facets
trailing punctuation is excluded from the facet span for raw URLs
unicode before the linked span still produces correct UTF-8 byte offsets
existing Bluesky image embed flow still builds a valid post payload

If mocking the entire Bluesky client is awkward, test the helper directly and keep the publish-path test light.

Risks

Wrong byte offsets

This is the main risk. If we use Python character indexes directly, Bluesky may receive malformed facet ranges.

Mitigation:

compute offsets from UTF-8 encoded prefixes
add a unicode regression test

Parser drift

A homegrown parser can grow into a mess if it starts trying to handle every markdown edge case.

Mitigation:

support only the explicit issue scope
keep the parser narrow: raw URLs and inline markdown links only

Bad overlap behavior

If raw URL parsing runs over already rewritten markdown-link output, we could create overlapping or duplicated facets.

Mitigation:

parse markdown links first
then parse raw URLs on the final text
reject overlaps before returning facets

Rollout

add the helper
add tests for raw URLs, markdown links, punctuation, and unicode
update the Bluesky publisher to send facets
run the relevant test suite
dry-run the social publisher if possible
land through branch and PR

Decision

Issue #79 should be solved with one narrow Bluesky text-preparation helper that converts authored snippet text into:

final visible Bluesky text
valid non-overlapping link facets with UTF-8 byte offsets

This fixes today’s missing-link problem without turning the social publisher into a general markdown engine.

@gurupanguji

Bluesky Link Facets Design

Goal

Scope

Non-Goals

Problem Summary

Existing Content Shape

Recommended Design

Parsing Rules

Markdown links

Raw URLs

Trailing punctuation

Overlaps

UTF-8 Indexing Requirement

Implementation Shape

Testing Strategy

Risks

Wrong byte offsets

Parser drift

Bad overlap behavior

Rollout

Decision