Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

How do you all add and subtract concepts in the rabbit poem?


Features correspond to vectors in activation space. So you can just do vector arithmetic!

If you aren't familiar with thinking about features, you might find it helpful to look at our previous work on features in superposition:

- https://transformer-circuits.pub/2022/toy_model/index.html

- https://transformer-circuits.pub/2023/monosemantic-features/...

- https://transformer-circuits.pub/2024/scaling-monosemanticit...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: