My take is: its either there with all of its features and popularity or its not. The argument that it will be taken down if its more popular seems to me fundamenally wrong.
I wonder why doesnt it use RAG for code context retrival across the codebase? Is there some fundamental reason why init in for example claude coder makes this stupid claude.md file instead of vectorizing and indexing the codebase locally?
The fundamental reason is that RAG kind of sucks and requires a ton of effort/optimization to reach a high degree of reliability for most applications. RAG solutions are not Anthropic's core product. Just reading all the relevant files is more expensive but is more effective and efficient from a dev time perspective.
From time to time, I use Cline for coding. In my perspective, LLM models are not yet there to grasp all of the complexities of designing any larger modular system. But what in my oppinion Cline does quite well is performing some mundane tasks that otherwise require me coding them. I tell it what I want to do, where it should be located and then I try to provide as much context in the initial prompt as I can in the form of files and class names that it should take inspiration from. If I do this properly, then its able to perform the coding task sufficiently.
My problem with it is when its editing large files (1000+ LOC), because requests consume very many tokens AND it has problems editing the code so that sometimes its cycling infinitely trying to modify two lines in some function.
Anyway, I like it more then Cursor, because of the controll I have over the model and in some subjective ways its more pleasing to me seeing it "work".
I'm surprised you have to provide that much context manually. Cline (as do many other agentic coding assistants) uses treesitter to build a syntax tree and uses that to navigate the codebase and request which files it should open and load into the context. That in my experience works quite well. On the other hand, every codebase is different, so YMMV.
It's definitely not great when it comes to giant files. Now granted, those are an antipattern anyway, but sometimes it's just the way it is, and having diffing fail and then waiting while it burns through tokens trying to write out the whole file is a little meh.
That being said, I'm quite impressed overall. When it works it's quite wonderful, how it will use function calling to effectively hand VSCode control over to the LLM for spawning commands, reading the diagnostics etc.
For the multi-taskers among us, this is gold. Kicking off small atomic feature implementations in parallel, and hopping between Cline sessions is.. interesting.
I prefer it to work on fairly narrowly scoped atomic features, and for those my prompts are quite minimal, or I do a short back and forth in "plan" mode until I'm happy with the plan.
For more complex features I do spec out the prompt in more detail.
The model can output at most 4k tokens I believe, though I think Cline will just run inference in multiple steps, so that shouldn't be a limitation that matters (only time/cost wise).
I dont know if point of this is just to derail public attention to narative “hey, chinese stole our model, thats not fair, we need computee”, when the deepseek has clearly done some exceptional technical breakthrough on R1 and v3 models. Which even if you stole data from OpenAi is its thing.
Has anyone ever tried to do some automatic email workflow autoresponder agents?
Lets say, I want some outcome and it will autonomousl handle the process prompt me and the other side for additional requirements if necessary and then based on that handle the process and reach the outcome?
Nice article :) Ive got some maybe unrelated questions about the feedback loop of the strong typing in code generation.
Ive experimented with automated code generating agent systems on localhost multiple times. Ive got some ok results when it was a project spanning over multiple files without any dependencies. But when I tried to cross a milestone of creating a web apps with lets say flask and some db, ive hit a wall. The problem were the dependencies and api hallucinations and the version/api inconsistencies of dependencies in genereated code. The system had a feedback loop with self-written tests, but even the output of the tests fell into neverending loop. Maybe the problem was python? Maybe annotations would help, maybe should use go?
great question. i am no expert on the topic, and my weekend project ended up being pretty small (a few hundred LoCs in total). I was able to get the core logic "work function" to work, but all of the stuff around it, I wrote by hand.