For You is based on your likes. If you get an empty feed then you probably haven't liked anything yet. Try liking a couple of posts in Discover feed and get back to For You.
To help me debug the algorithm I built a simple web UI that allows you to see the feed for any user by plugging their account id: https://linklonk.com/bluesky
You can switch perspective to other users and explore how the would experience the feed.
Yeah, understood. I'm excited for the reduction in parameter count that will come when this is taken up in major models.
I meant it rhetorically in reference to interpretability. I don't see a real difference between training a model that is 100b parameters vs a (fixed) 4x recurrent 25b parameter model as far as understanding what the model is `thinking` for the next token prediction task.
You should be able to use the same interpretability tooling for either. It can only `scheme` so much before it outputs the next token no matter if the model is just a fixed size and quite deep, or recurrent.
What you are describing is similar to how https://LinkLonk.com works (my side project) - when you "like" a link you get connected to the RSS feeds that posted that link and other users that also liked it. Then you get content from feeds and users that you are connected to. The more links in common you have with a feed or a user the more weight their other links have.
My understanding is that the attention in all transformer layers is "causal" - that is the output of a transformer layer for token N depends only on tokens from 0 to N.
This means that every attention layer can use previously calculated outputs for the same prompt prefix. So it only needs to calculate from scratch starting from the first unique token in the prompt sequence.
I think the commenter was thinking about the input embedding layer, where to get an input token embedding the model does a lookup of the embedding by index, which is constant time.
And the blog post author is talking about the output layer where the model has to produce an output prediction for every possible token in the vocabulary. Each output token prediction is a dot-product between the transformer hidden state (D) and the token embedding (D) (whether shared with input or not) for all tokens in the vocabulary (V). That's where the VD comes from.
It would be great to clarify this in the blog post to make it more accessible but I understand that there is a tradeoff.
The point is, if you upvote this link on LinkLonk (https://linklonk.com/item/481037215144673280), you automatically get subscribed to all of these feeds. This is a way to discover new feeds through content you liked.
Now, being connected to hundreds or thousands of feeds might seem crazy. But we have a solution to that which also relies on what content you "liked". LinkLonk knows how often you liked content from each feed you are connected to (which is essentially the signal-to-noise ratio). So it ranks new content based on that. If you like 50% of posts from https://simonwillison.net/atom/everything/ then new posts from Simon Willison will be shown above other links from, say, https://lobste.rs/rss.
The more you like - the better the ranking of fresh content becomes.
In this world you don't have to actively manage which feeds you are subscribed to or not. You only rate content.
I haven't used Artifact, but my understanding is it uses "AI" to personalize the feed of content and the sources of content it aggregates are based on an allow list of publishers.
LinkLonk differs in these two aspects:
1. The algorithm is intentionally simple - like content to get more from that publisher (ie, RSS feeds) and from other users that liked it. Dislike - to get less. There is no AI so that you as a user could have control. For example, LinkLonk does not use your view history to guess what else you would like.
2. The list of sources is any RSS feed users have added. LinkLonk also automatically tracks any feed that posted content users liked. In this respect LinkLonk is more similar to RSS readers.