labels: content moderation that doesn't suck

Originally on Leaflet ↗

i just added label support to pollen (does not load right now 😅 - it’s a tumblr-esque microblog. might just be for me. idk!) and the way atproto handles moderation is genuinely clever. it’s composable in a way that actually makes sense.

labels are just metadata

a label is someone saying “this thing has this property” at a specific time:

plain

{
  src: "did:plc:ar7c4by46qjdydhdevvrndac",  // who applied it
  uri: "at://did:plc:someone/app.bsky.feed.post/abc123",  // what
  val: "porn",  // the label
  neg: false,  // true = remove this label
  cts: "2024-01-01T00:00:00Z"
}

that’s it. the label spec has all the details but the core idea is really simple.

you pick your labelers

that src field is a DID, the labeler’s identity. bluesky runs one at mod.bsky.app but anyone can run one.

you know those labels on bluesky showing posting frequency or which network someone bridged from? same exact system. someone runs a labeler that watches for whatever criteria and applies labels. you subscribe, you see them.

profile labeller (flags rapid posters, bridged accounts, incomplete profiles)? labeler. xblock (hides screenshots from twitter and threads)? also a labeler. for something like pollen, you could imagine labelers for common tumblr-style needs: flagging unsourced art reposts, marking accounts that don’t tag their content, hiding screenshots from other platforms. the serious moderation stuff and the community norms stuff run on identical infrastructure. i love this.

in your settings you pick which labelers to trust and what each label should do.

three things labels can do

filter - don’t show it (spam, csam)
blur - cover it, click to reveal (nsfw)
alert - badge but visible (warnings, jokes)

labels can be negated with neg: true if someone made a mistake. they can also expire, which matters because posting behavior yesterday shouldn’t follow you forever.

subscribing to the stream

labelers expose a websocket:

plain

wss://mod.bsky.app/xrpc/com.atproto.label.subscribeLabels

labels stream in as they’re applied. it’s cbor-encoded, not json, which confused me for about an hour until i found cbor-x. i honestly don’t know what cbor is. you can pass a cursor to resume where you left off. subscription docs here.

why this works

transparent: you see what’s labeled and by whom
portable: labels follow content across apps
reversible: negation and expiration exist
your choice: subscribe to what you want

the same system powers “this is csam, filter it” and “this person uses arch linux, lol”. that’s good design.

what i could have done with this in the past

i spent five years at 🎏 glitch (rip), and content moderation was a constant headache no matter the size of the company. small team, endless scammers, crypto spam, phishing sites, the occasional person hosting malware. we built our own moderation tools, hired people, and still couldn’t keep up.

with something like these labels, we could have done what we were already doing (running our own moderation, flagging spam and scams) but also subscribed to community efforts. imagine a labeler run by security researchers flagging known phishing kits. or a community-maintained list of crypto scam templates. we wouldn’t have to trust them blindly—we could evaluate the labeler’s track record and decide how to handle each label type.

we built some of this internally (flagging patterns across projects, sharing signals between moderation decisions) but it was bespoke and isolated. creating screenshots of projects when they load and comparing them to other scams. those are things that at the least we could have been open-sourcing to the rest of the community.