This is exactly it! We built a browser agent and got awesome results by designing the context in a simplified/compact version + using small/efficient LLMs - it's smooth.sh if you'd like to try
This is a very valid concern. Here are some of our initial considerations:
1. Security of these agentic system is a hard and important problem to solve. We're indexing heavily on it, but it's definitely still early days and there is still a lot to figure out.
2. We have a critic LLM that assesses among other things whether the website content is leading a non-aligned initiative. This is still subject to the LLM intelligence, but it's a first step.
3. Our agents run in isolated browser sessions and, as per all software engineering, each session should be granted minimum access. Nothing more than strictly needed.
4. These attacks are starting to resemble social engineering attacks. There may be opportunities to shift some of the preventative approaches to the LLM world.
Thanks for asking this, we should probably share a write-up on this subject!
> 2. We have a critic LLM that assesses among other things whether the website content is leading a non-aligned initiative. This is still subject to the LLM intelligence, but it's a first step.
> [...]
> 4. These attacks are starting to resemble social engineering attacks. There may be opportunities to shift some of the preventative approaches to the LLM world.
With current tech, if you get to the point where these mitigations are the last line of defense, you've entered the zone of security theater. These browser agents simply cannot be trusted. The best assumption you can make is they will do a mixture of random actions and evil actions. Everything downstream of it must be hardened to withstand both random & evil actions, and I really think marketing material should be honest about this reality.
I agree, these mitigations alone can't be sufficient, but they are all necessary within a wider framework.
The only way to make this kind of agents safe is to work on every layer. Part of it is teaching the underlying model to see the dangers, part of it is building stronger critics, and part of it is hardening the systems they connect to. These aren’t alternatives, we need all of them.
Thanks! It all boils down to (1) using small and efficient models, and (2) insisting on good context engineering. We describe the browser state in a way that's both compact and meaningful. This allows us to use tiny LLMs under the hood.
Thanks for sharing this! It sounds very interesting. We experimented with a similar tree setup some time ago and it was giving good results. We eventually decided to move towards graphs as a general case of trees. I think the notion of using embeddings similarity for "walking" the graph is key, and we're actively integrating it in FastGraphRAG too by weighting the edges by the query. It's very nice to see so many solutions landing on similar designs!
These are knobs that you can tune to make the graph construction more/less opinionated. Generally speaking, the more we make it opinionated the better it fits the task.
At a high-level:
(1) Domain: allows you to "talk to the graph constructor". If you care particularly about one aspect of your data, this is the place to say it. For reference, take a look at some of the example prompts on our website (https://circlemind.co/)
(2) Example Queries: if you know what class of questions users will ask, it'd be useful to give the system this information so that it will "keep these questions in mind" when designing the graph. If you don't know which kinds of questions, you can just put a couple of high-level questions that you think apply to your data.
(3) Entity Types: this has a very high impact on the final quality of the graph. Think of these as the types of entities that you want to extract from your data, e.g. person, place, event, etc
All of the above help construct the knowledge graph so that it is specifically designed for your use-case.