<?xml version="1.0" encoding="UTF-8"?><rss version="2.0" xmlns:content="http://purl.org/rss/1.0/modules/content/">
  <channel>
    <title>Tom&#39;s Opinions</title>
    <link>https://octet-stream.net/b/to/</link>
    <description></description>
    <managingEditor>tom.karpiniec@outlook.com (Thomas Karpiniec)</managingEditor>
    <pubDate>Thu, 19 Feb 2026 23:13:08 +0000</pubDate>
    <item>
      <title>Several people are typing...</title>
      <link>https://octet-stream.net/b/to/several-people-are-typing.html</link>
      <description></description>
      <content:encoded><![CDATA[<p>
I use the chat system <a href="https://matrix.org/">Matrix</a> regularly. For something so distributed and open I continue to be amazed how well it all works. My account is hosted by a friendly fellow across town whom I don't actually know very well and it's absolutely endearing to me how the server goes down when the power goes out. (Then syncs back up because, well, Matrix.)
</p>

<p>
Still, every now and again I'm drawn back to <a href="https://irssi.org/">irssi</a> and IRC for a while. Many Matrix rooms are bridged so I can talk to most people either way. It's outdated and outclassed but it's always so outrageously <em>comfortable</em>. I can point to various explanations—nostalgia for IRC sessions in years past, the professional joy of simple and efficient protocols, licence to go AFK rather than tethered to a smartphone, the super-fast clients with native UIs.
</p>

<p>
They're all valid enough but if there's one difference in IRC that actually elicits a sigh of relief from me then it's this: typing indicators and read receipts.
</p>

<p>
They aren't there.
</p>

<p>
Thank the heavens. Finally we can chill and just <em>chat</em> without all this <em>pressure</em>.
</p>

<p>
Surely it's not just me? Putting my life on hold while my brain churns, trying to anticipate what the next message will be, deriving hints from how long it's taking, watching if there are any pauses, re-reading what I just said to make sure it wasn't too dumb.
</p>

<p>
Then once I hit send, waiting to see who's watching. Oh, that person who uses a lot of reaction emojis saw it and didn't react. Not good enough, write a better message next time. Damn it all, now the group chat has its own Twitter-dopamine hedonic treadmill.
</p>

<p>
Well I'm sure it's not like that for everyone. It's not like I always had this ridiculous sort of thought process. Social media is a helluva drug, particularly microblogging, and now it's hard not to overthink things even in simple chat scenarios.
</p>

<p>
It reminds me of the notion that the internet has bamboozled our collective brains with its rapid context-switching, infinite scroll and notifications and now many people find it difficult to read a full chapter of a book without getting distracted partway through. In this case the solution is clear enough—take more breaks from the internet and practise reading the dang book until the brain gets itself straight. I know this works because I had to do it myself a few years back.
</p>

<p>
I suppose that's going on when I fire up irssi. Partly it's for my own good. If I don't have those UI indicators then I can't get worked up about them. Some time away means they're less likely to enthrall me later, or I'll at least recognise anew how beguiling they are and try to distance myself mentally. But it's partly also nostalgia for when <em>nobody</em> had those UI elements.
</p>

<p>
Sometimes you go to type something and think better of it. Or you spend 10 minutes composing a message and then delete most of it. Far out, now some people know. Which ones though? Well, we're not entirely sure because we haven't quite got to the point of read receipts for typing indicators but I'm sure a startup will be founded on this concept before long.
</p>

<p>
We used to have rapid fire conversations—who needs a full stop key when you can just press enter? And if you really want someone to know that you read something you just type "lol". Easy done.
</p>

<p>
Best of all, lurking on IRC is actually private. If there's a spicy argument going on and my little icon is flitting down to each new message as they arrive I'm half expecting the enraged participants to stop and have a go at the bystander. I find myself doing weird things like keeping the window visible and unfocused so I can read along without suspicion.
</p>

<p>
Before anyone sends me a tirade about open source, I'm aware that Matrix has a plethora of clients and I'm free to use or write a program that acts like an IRC user, in which I'm only active at the precise moment I send a message. True enough, but it's not just a Matrix thing, and I'm pretty sure it's not just a me thing. It is my humble opinion that we're raising the stakes on text chat too high. Maybe we'd all be calmer if messages just happened when they happened? Email got one thing right at least.
</p>


]]></content:encoded>
      <author>Thomas Karpiniec</author>
      <guid>urn:uuid:35475642-9d57-11ef-954e-67d13ca72760</guid>
      <pubDate>Thu, 08 Aug 2024 12:00:00 +1000</pubDate>
    </item>
    <item>
      <title>LLMs are DRM for information</title>
      <link>https://octet-stream.net/b/to/llms-are-drm-for-information.html</link>
      <description></description>
      <content:encoded><![CDATA[<blockquote>
Oh man I just realised I'll be having conversations like this in 5 years<br>
"Hey I was trying to find your opening hours but couldn't find them anywhere"<br>
"Really? We submit them to both Gemini and GPT"
</blockquote>

<p>(<a href="https://social.octet-stream.net/@thomask/113518496643958419">me, earlier</a>)

<p>
In this post I want to make a few unhappy observations. I will also attempt to predict an enshittification cycle in advance. Many would say that AI is already enshittified but in this case I'm actually talking about Doctorow's original formulation, not just trying to say "shitty" ornately. At the end I've included an upbeat call to action but we all know it won't do anything. Let's begin.
</p>

<p>
I believe we are currently experiencing the "good old days" of LLM technology. Tech companies are hoovering up everything on the open internet <a href="https://www.abc.net.au/news/2023-09-29/australian-authors-copyright-books3-generative-i-chatgpt/102914538">and elsewhere</a> to feed their models so that it can be regurgitated in different forms. Certainly this is unpleasant but for now it's kind of background noise. The LLMs are a low resolution screenshot of the broader internet and you can ignore them more or less without peril. Everything you need is available other ways.
</p>

<p>
The trouble is, this training on all the world's data is no longer enough of a commercial advantage for these companies. Lots of people are making them and the open models are catching up. If you've already ingested everything, where do you go from there?
</p>

<p>
Obviously you want new and exclusive data. That way your LLM can answer questions that nobody else's can and you become more competitive. However, collecting new forms of contemporary data is complex and expensive. Instead, what if we made individuals and businesses feed in relevant data en masse, all on their own? What if they had a motivation to do so?
<p>

<p>
Consider Sammy, the owner of a small fictional cafe on an alley in Fitzroy. Her supplies of eggs have dried up for two weeks due to a fictional H5N1 outbreak and she needs to take items off her menu.
</p>

<p>
"Hey," she says into her Pixel. "Update the public menu information for the cafe. The PDF is in my Google Drive but get rid of all the recipes containing eggs. We hope to have them back on the 22nd."
</p>

<p>
What better way to get people to feed your model unique data destined for public consumption? Self-interest and a low-friction interface.
</p>

<p>
Here's where we can get even more devious. A naive Google would interpret Sammy's instructions, update a database of structured data, and display the new information on Google Maps. Evil Google can instead do two things: actively avoid turning this into structured data&mdash;simply store it for training&mdash;and ensure this information can be retrieved only by directing queries at their own AI tool, like at the top of a Google search.
</p>

<p>
Now their LLM is not only more valuable, it is virtually impossible to scrape. Output is customised for each user and their specific query and circumstances. Keeping information in unstructured format and accessing it via weights and APIs is a strong defence against anybody else hoovering your up <em>your</em> data. It's DRM for information.
</p>

<p>
So here is how the enshittification happens, maybe.
</p>

<ol>
<li><b>Be good to users.</b> Best I can tell, this part is still underway though people are starting to grumble about ChatGPT's quality. Tokens are cheap and there are plenty of optimistic users who feel like the tools are on their side.
<li><b>Abuse users to make things better for business customers.</b> Businesses pay to get LLMs to spruik their products while making it sound natural(-ish). For a while it will be flexible and inexpensive to get in front of the kind of people who like talking to AIs all the time.
<li><b>Abuse the business customers.</b> Now the mainstream are talking to AIs to search for information so you have to pay to be there. Worse still, they don't tend to swap between them&mdash;a particular person speaks to the same AI all the time so if you want good reach you're compelled to advertise on all the major ones. It's gonna cost you. Advertisers start backing out and profits start drooping.
<li><b>Then they die.</b> And something of value is lost&mdash;all the genuine information that could have been put on a WordPress site in the first place.
</ol>

<p>
What can we do about it? I guess stop giving corporations datasets for free unless you're putting it somewhere else too. At least in my part of the world, OpenStreetMap has terrible information about basic details like business opening hours so consider contributing to those. Also please build great apps that build on truly open data sets like OSM&mdash;free apps, proprietary apps, I don't care&mdash;the more of everyday computing is based on open and structured data, the less will get locked up inside LLMs.
</p>

]]></content:encoded>
      <author>Thomas Karpiniec</author>
      <guid>urn:uuid:fa8b3bee-a7f2-11ef-ac48-1f29877782a2</guid>
      <pubDate>Thu, 21 Nov 2024 21:33:00 +1100</pubDate>
    </item>
    <item>
      <title>Bluesky, Fedi, and making centralisation modular</title>
      <link>https://octet-stream.net/b/to/bluesky-fedi-making-centralisation-modular.html</link>
      <description></description>
      <content:encoded><![CDATA[<p>
In <i>the discourse</i> it's pretty well established by this point that <a href="https://en.wikipedia.org/wiki/Bluesky">Bluesky</a> is not meaningfully decentralised, with <a href="https://dustycloud.org/blog/how-decentralized-is-bluesky/">Christine Lemmer-Webber giving us the most comprehensive breakdown</a>. I've always been a little prickly about this. I like decentralisation; I don't like having a single company responsible for keeping the lights on in a given communication platform.
</p>

<p>
However, over the course of thinking about it my stance has softened. In truth I haven't been fair to AT Protocol and the elephant in the room is <em>search engines</em>. For those of us in the decentralisation fan club, which includes the traditional approach of publishing on your own website, we're leaning massively on these enormous centralisation points all the the time to paper over the lack of discoverability<a id="footnote-1-ref" href="#footnote-1">[1]</a>. Is it reasonable to criticise AT for incorporating fit-for-purpose centralised aspects directly into their design? I think yes and no. It's not the worst thing but some parts could be better.
</p>

<p>
Let's start with another system that's federated: email. If you start composing a message and type in somebody's name, autocomplete does not show you a list of every email address on the planet owned by someone with that name. There is an expectation that there's some out-of-band mechanism for finding somebody's email address. Maybe they put it on their website. Maybe you find that website using a search eng- oh, damnit. But anyway you don't have to use a search engine. They can scribble it on a piece of paper. You can send the email, and it doesn't have to pass through any servers funded by <a href="https://en.wikipedia.org/wiki/Bluesky#Public_launch">a company with "Blockchain" in the name</a>.
</p>

<p>
On the other end of the spectrum, the Instagram website knows every Instagram user that exists so you can search for them even if you don't know their handle. It's simple and reliable but putting a lot of dependence on a single entity.
</p>

<p>
What about the web? From a technological standpoint it's basically the same as email. If you plug in a computer and get access to the IPv4 and IPv6 address space you have no particular knowledge of what IPs are live, what domains exist on those IPs, or what content might be on the associated webpages. It's easy to forget how spartan it is because half of our industry exists to compensate for these shortfalls, providing us with a wide range of tools which are all forms of centralisation. We have search engines, we have link aggregators, and we have websites like Facebook and LinkedIn which simply try to put every person on the same domain so there's no ambiguity about where to find them.
<p>

<p>
Maybe that last sentence feels a little uncomfortable&mdash;Bing and Hacker News are fine for what they do but does <em>Facebook</em> deserve to be in the same list? Probably not. In my mind there are two things which set search engines and aggregators apart from the big social media networks.
</p>

<p>
First, they work with the idea that the web is modular. Anyone can make a search engine (just ask <a href="https://search.marginalia.nu/">Viktor</a>) and you can visit any search engine you like. If one doesn't do what you want then ignore it and choose another one.
</p>

<p>
Second, your interactions with them are ephemeral and bounded. You lean on the expensive and complicated centralisation in the moment that you need it to accomplish a particular goal and then you stop communicating with them. When you search for a website you can put it in your bookmarks. When you find cool blogs on <a href="https://tildes.net">tildes.net</a> you can put them in your RSS reader. Facebook and LinkedIn don't stop there. They insist that you continue to perform all interactions through their proprietary space, even if you have switched to something that <em>could be</em> point-to-point like having a chat.
</p>

<p>
Let's bring this back to Fediverse vs Bluesky. There are various centralised features you might want.
</p>

<table border="1">
<tr><th>Feature</th><th>Central?</th><th>Does Tom care?</th></tr>
<tr><td>Follow specific people, post to followers</td><td>No</td><td>Yes, this is basically want I want to do.</td></tr>
<tr><td>Realtime notifications for replies</td><td>No</td><td>Yes, this is important for conversation.</td></tr>
<tr><td>Search for users</td><td>Yes</td><td>No, I'm happy to learn about new people through boosts or their websites.</td></tr>
<tr><td>Search for content</td><td>Yes</td><td>Yes, I would love a global search engine for the fediverse.</td></tr>
<tr><td>Global hashtags</td><td>Yes</td><td>Kind of, it can be fun or useful for events but the SNR is generally poor. I mostly use tags for blocking (both others' posts, and a courtesy so others can block mine).</td></tr>
<tr><td>See all replies on own and viewed posts</td><td>No</td><td>Yes, it's annoying when you're missing things. The fact that this doesn't work on the fediverse is an implementation detail/choice that could be changed.</td></tr>
<tr><td>Accurate like/boost counts</td><td>Yes</td><td>It would be nice if they were updated occasionally rather than not at all but this is generally an unhealthy part of social media to focus on.</td></tr>
<tr><td>Trending users or content</td><td>Yes</td><td>No.</td></tr>
</table>


<p>
I can draw a few conclusions:
</p>

<ol>
<li>What degree of centralisation you want or don't want probably depends heavily on personal preferences about how you use social media (or views about how social media <em>should</em> be if you're that sort of personality).
<li>You can have something that's technically a social network without any centralised aspects whatsoever but it's going to be a bit dreary.
<li>Some of these centralised capabilities require integration into the social network protocol, such as tracking dopamine accurately. Many others could be implemented by having external modular services (websites) that consume a firehose to provide search or dashboards of trending content or so on. Opinions will vary whether it's a better to have one choice in-app, or multiple modular choices out-of-app.
<li>Once you start tracking all the content centrally for one feature you kind of may as well do it all?
</ol>

<p>
So if I'm willing to take a step back and consider that maybe some of my opinions are a little fringe, then atproto's architectural decisions seem pretty reasonable. The fact that anybody can run a relay means there's a possibility of an ecosystem of modular services building on the data. However a desire to bring all the global/realtime features that users want into the Bluesky app itself means that it has recentralised into a particular AppView and now you're dependent on that enormous component for even the basic things like knowing what your followers posted recently. There will be less incentive for people to build or use third party services when they've been pre-sherlocked by Bluesky itself.
</p>

<p>
At the same time there are real weaknesses in the fediverse solution. It could borrow some concepts from atproto but do it the modular webby way. I would love it there was a fediverse firehose&mdash;many of them, where people who want to run search engines can subscribe to the public feeds of all the instances they're aware of and get the latest updates for indexing and analysis. We could have a vibrant ecosystem of search, global hashtags and entertaining trends all provided by third parties, where we get the benefits but our interactions with and dependence on those systems are bounded to those particular tasks. Meanwhile core messaging functions remain instance-local and users are free not to opt-in to sending data to the firehose.
</p>

<p>
If it were possible to bring about this modular approach to centralisation, I'd have confidence in the fediverse being the platform that can achieve a happy balance in the long term. Individual aggregators can come and go while the core business of the network is resilient and private. If something goes wrong with Bluesky, well, you kind of have to throw away the whole thing and start again. I know some people are happy to treat their social media platforms as discardable things where you can follow the crowd from year to year. That's valid but unsatisfying and I'd love to find a combination of technology that's more enduring.
</p>

<hr>

<ol>
<li id="footnote-1">For the purposes of this essay I will ignore the fact that our glorious decentralised web is also utterly dependent on DNS and CA PKI. If you enjoy being gloomy about this sort of thing, I recommend <a href="https://computer.rip/2022-01-16-peer-to-peer-but-mostly-the-main-peer.html">this Computers Are Bad post</a>. <a href="#footnote-1-ref">↩︎</a>
</ol>

]]></content:encoded>
      <author>Thomas Karpiniec</author>
      <guid>urn:uuid:ce9ccba6-ab94-11ef-9a6b-636ed7228cd3</guid>
      <pubDate>Tue, 26 Nov 2024 12:31:00 +1100</pubDate>
    </item>
  </channel>
</rss>