Next.js App Router + React Server Components Demo

new
past
show
ask
show
jobs
submit

▲Show HN: Plandex v2 – open source AI coding agent for large projects and tasks (github.com)

257 points by danenania 86 days ago | 81 comments

maxwelljoslyn 86 days ago [-]

It seems that much like Aider, you use separate models for creating code edits and validating them. That's a win in my book. It seems Claude Code does not do that, which is part of the reason it racks up a (relatively) large bill for long work sessions (that and the cost of sonnet-3.7).

I bounce back and forth between Aider, Claude Code, and Simon Willison's LLM tool ("just" a GOOD wrapper for using LLMs at the CLI, unlike the other two which are agent-y.) LLM is my favorite because I usually don't need/want full autonomy, but Claude Code has started to win me over for straightforward stuff. Plandex looks cool enough to throw into the rotation!

My main concern at this point is that I use a Mac and as far as I understand it Docker containers can have pretty poor performance on the Mac, so I'm wondering if that will carry over to performance of Plandex. (I don't use Docker at all so I'm not sure if that's outdated info.)

danenania 86 days ago [-]

> It seems that much like Aider, you use separate models for creating code edits and validating them.

That's right. To apply edits, Plandex first attempts a deterministic edit based on the edit snippet. In some cases this can be used without validation, and in others a validation step is needed. A "race" is then orchestrated with o3-mini between an aider-style diff edit, a whole file build, and (on the cloud service) a specialized model. I actually wrote a comment about how this works (while maintaining efficiency/cost-effectiveness) a couple days ago: https://news.ycombinator.com/item?id=43673412

And on the Docker question, it should be working well on Mac.

lsdkfjlkasfj 85 days ago [-]

What are the main differences from aider?

danenania 85 days ago [-]

A few differences:

- Plandex is more agentic—it can complete a complex task, updating many files, all in one go.

- Changes are applied to a sandbox by default rather than directly to project files, helping you prevent unintended changes.

- Plandex can automatically find the context it needs in the project.

- Plandex can execute commands (like installing dependencies, running tests, etc.) and auto-debug if they fail.

- Plandex should be more reliable on file edits—it uses an enhanced version of aider's diff-style edit that is resilient to multiple occurrences, but it also has validation, a whole file fallback, and on the cloud service, a custom fast apply model is also added to the mix. Will be publishing benchmarks on this soon.

Onawa 86 days ago [-]

Docker containers can be somewhat slower due to most available Docker images targeting x86. If you build for Arm, it should be better.

erikcelander 86 days ago [-]

orbstack > docker on mac

prophesi 86 days ago [-]

Should be noted that it's proprietary and requires paying for a license for commercial/business use.

https://docs.orbstack.dev/faq#free

volkk 86 days ago [-]

switched to this and never looked back

sagarpatil 86 days ago [-]

I spend hours everyday researching AI products. Why am I only hearing about plandex now? Looks very promising, I’ll give it a try. Please up your marketing game, the product looks solid!

viraptor 86 days ago [-]

Maybe because they had a release hiatus for a bit. V1 was cool and had great ideas / interface, but needed some polish in practice (new models weren't well supported, it for in endless cycles, etc.) The explicit planning, server model, visible steps and a few other novel things were impressive though.

Hopefully the V2 will bring the polish and reliability.

danenania 86 days ago [-]

Yeah, you're spot on. Rather than just continuously marketing and iterating on the v1, which was designed more for AI codegen's early experimental phase, I decided to go heads down on engineering and try to get to more of a step change on reliability and robustness.

It's not a total rewrite and is still based on v1's foundations, but quite a lot of the core functionality has been reworked.

LeonidBugaev 86 days ago [-]

You should check https://probeai.dev/ too. Thats one of those building blocks which makes AI trully understand the code.

djhn 85 days ago [-]

Are you getting paid to research AI products in some capacity or is this a personal interest and time investment?

sagarpatil 82 days ago [-]

Hey. It’s a personal interest. I don’t know how and when it started but now it’s become a habit.

danenania 86 days ago [-]

Thanks! Looking forward to hearing your thoughts.

> Please up your marketing game, the product looks solid!

Working on it!

chrisweekly 86 days ago [-]

This looks powerful and useful. Also, IMHO it's an exemplary "Show HN"; nice job w/ the description.

danenania 86 days ago [-]

Thanks for the kind words! I really appreciate it.

lightdot 86 days ago [-]

From the Github page: "curl -sL https://plandex.ai/install.sh | bash"

Enticing users to blindly run remote 3rd party code on their machines is IMHO not a proper thing to do.

This approach creates a dangerous mindset when it comes to security and good practices in general.

danenania 86 days ago [-]

You can read the script before installing. It's pretty straightforward—just grabs the appropriate binary from GitHub and puts it in /usr/local/bin.

Installing via package managers or installers also runs remote 3rd party code on your machine, so I don't see much difference from a security perspective. You should make sure you trust the source before installing anything.

lightdot 86 days ago [-]

Of course one can and should read the script before running it, but the instructions promote just the opposite.

Even if we skip a step ahead and consider that this script then installs a binary blob... the situation doesn't get any better, does it?

If you find any of this as something normal and acceptable, I can only strongly disagree. Such bad practices should be discouraged.

On the other hand, using a distro's package manager and a set of community approved packages is a far better choice when installing software, security vise. I really don't see how you could compare the two without plainly seeing the difference, from a security perspective.

As an alternative, if the software is not available through a distro's package manager, one should inspect and compile the code. This project provides the instructions to do so, they are just not promoted as a first choice.

I can't help coming to a conclusion, that you've largely made my point about bad practices and having a wrong mindset when it comes to software security.

danenania 86 days ago [-]

Well, I simply disagree with you that it's a "bad practice", and I have a fair amount of security experience. But you're entitled to your opinion.

You can also build from source if you prefer: https://docs.plandex.ai/install/#build-from-source

stronglikedan 86 days ago [-]

The instructions presume that one would follow best practices when installing something where the source is available, and doesn't need to explicitly include all the steps to do so in this context. You are correct in that it would be bad practice to blindly install something, but knowing what you are installing is the first step to installing when you are following best practices. That onus is on the person doing the installing, not the installation instructions.

bottled_poe 85 days ago [-]

How is this any different to downloading and running a binary?

Yiling-J 86 days ago [-]

I noticed there's an example in the docs: plandex rm app/**/*.ts # by glob pattern.

However, looking at the code (https://github.com/plandex-ai/plandex/blob/main/app/cli/cmd/...), it seems you're using path/filepath for pattern matching, which doesn't support double star patterns. Here's a playground example showing that: https://go.dev/play/p/n8mFpJn-9iY

danenania 86 days ago [-]

Thanks for flagging—I'll take a look at this.

jtwaleson 86 days ago [-]

Nice! I tried it out when you launched last year but found it pretty expensive to use. I believe I spent $5 for half an hour of coding or so. Can you share what the typical costs are now, since the model prices have changed significantly?

danenania 86 days ago [-]

It's a bit hard to give "typical" costs because it's so dependent on how you use it. The project map size (which scales with overall project size) and the number/size of relevant files are the main drivers of cost, so working in large existing codebases will be a lot more expensive than generating a new app from scratch.

Taking Plandex's codebase as an example, it's certainly not huge but is getting to be decent-sized—I just ran a count and it's at about 200k lines (mostly Go), which translates to a project map of ~43k tokens. I have a task in progress right now to add a json config file for model settings and other project settings. To get to a pretty good initial version of this feature, I first did a fair amount of back-and-forth in 'chat mode' to pin down the details (maybe 10 or so prompts) and then an implementation phase where ~15 files were updated. The cost so far is at a little under $10.

dr_kiszonka 85 days ago [-]

Hi. Nice product!

Let's say I have a repo for an NLP project. One directory contains a few thousand text files. Can I tell Plandex to never ever index and access them? For my use case, I wish projects in this space always asked me before accessing anything. Claude recently decided to install seven Python packages and grabbed full terminal output following installation, which turned out pretty expensive (and useless).

danenania 85 days ago [-]

Hi, thanks! Yes, you could either:

- Add that directory to either .gitignore (in a git repo) or a .plandexignore file (which uses gitignore syntax).

- You can switch to a mode where context is not loaded automatically and you choose the files yourself instead (more on this here: https://docs.plandex.ai/core-concepts/autonomy).

jtwaleson 86 days ago [-]

Thanks! Quite a bit more money than Cursor (probably better quality, as Cursor's context is limited) but still peanuts compared to hiring someone :)

jmcpheron 86 days ago [-]

Plandex was one of the first agentic style coating system to I tried several months ago, and it worked very well. But I've been using the cursor and windsurf style editors more recently because of their popularity. And their effectiveness is honestly pretty great.

Would you classify Plandex as more similar to a terminal interface like Claude Code? Also it looks like Open AI released a similar terminal based tool today. https://github.com/openai/codex

Do you see an obvious distinctions or pros/cons between the terminal tools and the IDE systems?

danenania 86 days ago [-]

> Would you classify Plandex as more similar to a terminal interface like Claude Code? Also it looks like Open AI released a similar terminal based tool today. https://github.com/openai/codex

Yes, I would say Plandex is generally similar in spirit to both Claude Code and OpenAI's new Codex tool. All three are agentic coding tools with a CLI interface.

A couple areas where I think Plandex can have an edge:

- Most importantly, it's almost never the case these days that a single provider offers the best models across the board for coding. Instead, each provider has their strengths and weaknesses. By slotting in the best model for each role, regardless of provider, Plandex is able to get the best of all worlds. For example, it currently uses Sonnet 3.7 by default for planning and coding, which by most accounts is still the best coding model. But for the narrow task of file edits, o3-mini offers drastically better speed, cost, and overall results. Similarly, if you go above Sonnet 3.7's context limit (200k tokens), Plandex can seamlessly move you over to a Gemini model.

- It offers some unique features, like writing all changes to a sandbox by default instead of directly to project files, that in my experience make a big difference for getting usable results and not leaving behind a mess by accident. I won't list all the features again here, but if you go through the README, I think you'll find a number of capabilities are quite helpful and aren't offered by other tools.

> Do you see an obvious distinctions or pros/cons between the terminal tools and the IDE systems?

I'm a Cursor subscriber and I use both Cursor and Plandex regularly for different kinds of tasks. For me, Cursor works better for smaller, more localized changes, while Plandex offers a better workflow for tasks that involve many steps, many files, or need many attempts to figure out the right prompt (since Plandex has more robust version control). I think once you are editing many files in one go, the IDE tab-based paradigm starts to break down a bit and it can become difficult to keep a high level perspective on everything that's changing.

Also, I'd say the terminal is naturally a better fit for executing scripts, installing dependencies, running tests and so on. It has your environment already configured, and it's able to control execution in a much more structured and reliable way. Plandex, for example, can tentatively apply a bunch of pending changes to your project, execute an LLM-generated script, and then roll back everything if the script fails. It's pretty hard to achieve that kind of low-level process control from an IDE.

killerstorm 86 days ago [-]

I like the idea but it did not quite work out of box.

There was some issue with sign-in, it seems pin requested via web does not work in console (so the web suggesting using --pin option is misleading).

I tried BYO plan as I already have openrouter API key. But it seems like default model pack splits its API use between openrouter and openai, and I ended up stuck with "o3-mini does not exist".

And my whole motivation was basically trying Gemini 2.5 Pro it seems like that requires some trial-and-error configuration. (gemini-exp pack doesn't quite work now.)

The difference between FOSS and BYO plan is not clear: seems like installation process is different, but is the benefit of paid plan that it would store my stuff on server? I'd really rather not TBH, so it has negative value.

danenania 86 days ago [-]

Thanks for trying it!

Could you explain in a bit more detail what went wrong for you with sign-in and the pin? Did you get an error message?

On OpenRouter vs. OpenAI, see my other comment in this thread (https://news.ycombinator.com/item?id=43719681). I'll try to make this smoother.

On Gemini 2.5 Pro: the new paid 2.5 pro preview will be added soon, which will address this. The free OpenRouter 2.5 pro experimental model is hit or miss because it uses OpenRouter's quota with Google. So if it's getting used heavily by other OpenRouter users, it can end up being exhausted for all users.

On the cloud BYO plan, I'd say the main benefits are:

- Truly zero dependency (no need for docker, docker-compose, and git).

- Easy to access your plans on multiple devices.

- File edits are significantly faster and cheaper, and a bit more reliable, thanks to a custom fast apply model.

- There are some foundations in place for organizations/teams, in case you might want to collaborate on a plan or share plans with others, but that's more of a 'coming soon' for now.

If you use the 'Integrated Models' option (rather than BYO), there are also some useful billing and spend management features.

But if you don't find any of those things valuable, then the FOSS could be the best choice for you.

killerstorm 86 days ago [-]

When I used `--pin` argument I got an error message along the lines of "not found in the table".

I got it working by switching to oss model pack and specifying G2.5P on top. Also works with anthropic pack.

But I'm quite disappointed with UX - there's a lot of configuration options but robustness is severely lacking.

Oddly, in the default mode out of box it does not want to discuss the plan with me but just jumps to implementation.

And when it's done writing code it aggressively wants me to decide whether to apply -- there's no option to discuss changes, rewind back to planning, etc. Just "APPLY OR REJECT!!!". Even Ctrl-C does not work! Not what I expected from software focused on planning...

danenania 85 days ago [-]

Thanks, I appreciate the feedback.

> Oddly, in the default mode out of box it does not want to discuss the plan with me but just jumps to implementation.

It should be starting you out in "chat mode". Do you mean that you're prompted to begin implementation at the end of the chat response? You can just choose the 'no' option if that's the case and keep chatting.

Once you're in 'tell mode', you can always switch back to chat mode with the '\chat' command if you don't want anything to be implemented.

> And when it's done writing code it aggressively wants me to decide whether to apply -- there's no option to discuss changes, rewind back to planning, etc. Just "APPLY OR REJECT!!!". Even Ctrl-C does not work! Not what I expected from software focused on planning...

This is just a menu to make the commands you're most likely to need after a set of changes is finished. If you press 'enter', you'll return back to the repl prompt where you can discuss the changes (switch back to chat mode with \chat if you only want to discuss, rather than iterate), or use commands (like \rewind) as needed.

killerstorm 85 days ago [-]

Here's what happened:

  1.  It started formulating the plan
 2.  Got error from provider (it seems model set sometime randomly resets to default?!?)
 3.  After I switched to different provider, I want it to continue planning, so I use \continue command
 4.  But when it gets \continue command it starts writing code without asking anything!
 5.  In the end it was still in chat mode. I never switched to tell mode, I just wanted it to keep planning.

Here's an excerpt: https://gist.github.com/killerstorm/ad8afa19b2f55588eb317138...

It went from entry 3 "Made Plan" to entry 4 and so on without any input from my end.

I could not reproduce the second issue this time: I didn't get the same menu and it was more chill.

danenania 85 days ago [-]

I see, it sounds like \continue is the issue—this command is designed to continue with implementation rather than with a chat, so it switches you into 'tell mode'. I'll try to make that clearer, or to make it better handle chat mode. I can definitely see how it would be confusing.

The model pack shouldn't be resetting, but a potential gotcha is that model settings are version controlled, so if you rewind to a point in the plan before the model settings were changed, you can undo those changes. Any chance that's what happened? It's a bit of a tradeoff since having those settings version controlled can also be useful in various ways.

This feedback is very valuable, so thanks again!

throwup238 86 days ago [-]

The installation process for the FOSS version includes both the CLI (which is also used for the cloud version) and a docker-compose file for the server components. Last time I tried it (v1) it was quite clunky but yesterday with v2 it was quite a bit easier, with an explicit localhost option when using plandex login.

danenania 86 days ago [-]

I'm glad to hear it went smoothly for you. It was definitely clunky in v1.

throwup238 86 days ago [-]

I would get rid of the email validation code for localhost, though. That remains the biggest annoyance when running it locally as a single user. I would also add a $@ to the docker-compose call in the bash start script so users can start it in detached mode.

danenania 86 days ago [-]

It should already be skipping the email validation step in local mode. Is it showing up for you?

I’ll look into the detached mode, thanks!

throwup238 86 days ago [-]

Yes, it showed up for me, luckily I had the logs open and remembered that was the solution in v1 (it wasn’t documented back then iirc). I git pulled in the same directory I ran v1 in so maybe there’s some sort of left over config or something?

danenania 85 days ago [-]

Email pins are disabled based on a LOCAL_MODE environment variable, which is set in the docker-compose config. I'll take a look.

vunderba 86 days ago [-]

Yeah, I noticed that (needing a dedicated OpenAI key) as well for the BYO key plan. It's a little bit odd considering that open router has access to the open AI models.

https://openrouter.ai/openai

danenania 86 days ago [-]

OpenRouter charges a bit extra on credits, and adds some latency with the extra hop, so I decided to keep the OpenAI calls direct by default.

I hear you though that it's a bit of extra hassle to need two accounts, and you're right that it could just use OpenRouter only. The OpenRouter OpenAI endpoints are included as built-in models in Plandex (and can be used via \set-model or a custom model pack - https://docs.plandex.ai/models/model-settings).

I'm also working on allowing direct model provider access in general so that OpenRouter can be optional.

Maybe a quick onboard flow to choose preferred models/providers would be helpful when starting out (OpenRouter only, OpenRouter + OpenAI, direct providers only, etc.).

killerstorm 86 days ago [-]

FWIW I got it working by selecting oss or anthropic model packs. But I had some OpenAI key... maybe it would work with a dummy.

MadsRC 86 days ago [-]

This looks great!

With the self-host option, it’s not really clear through the docs if one is able to override the base url of the different model providers?

I’m running my own OpenAI, Anthropic, Vertex and Bedrock compatible API, can I have it use that instead?

danenania 86 days ago [-]

Thanks!

Yes, you can add 'custom models' and set the base url. More on this here: https://docs.plandex.ai/models/model-settings

ramesh31 86 days ago [-]

CLI interfaces are not where it's at for this stuff.

What makes Cline the king of codegen agents right now IMO (from a UI/UX perspective) is how well they handle displaying the code, opening files, and scrolling the cursor as it changes. Even in a fully autonomous agentic flow, you still really want to be reading the code as it is written, to maintain context and keep steering it correctly. Having to go back and look at a huge diff after all of your updates is a real pain and slows things down.

victor9000 86 days ago [-]

UI/UX is one of the last things I'm worried about when leveraging generative tools. When I need to inspect what the model has done, a quick git diff does the trick.

86 days ago [-]

nidnogg 86 days ago [-]

Good luck with a quick git diff spanning thousands of lines. AI code can be notoriously verbose and over engineered. Of course you want to be in control.

danenania 86 days ago [-]

Just to note, apart from git diff style diffs, Plandex also offers a web UI for side-by-side diff comparison—similar to GitHub's PR review UI.

zamalek 86 days ago [-]

Have you considered adding LSP support? I anticipate go-to-defintion/implementation and go-to-usages being pretty useful via MCP or function calling. I started doing this for an internal tool a while back (to help with understanding some really poorly written Ruby) but I don't find any joy in coding this kind of stuff and have been hoping for someone else to do it instead.

danenania 86 days ago [-]

Yeah, I've definitely thought about this. I would likely try to do it through tree-sitter to keep it as light and language-agnostic as possible vs. language-specific LSP integrations, but I agree it could be very helpful.

gcanyon 86 days ago [-]

> It has an effective context window of 2M tokens, and can index projects of 20M tokens and beyond using tree-sitter project maps (30+ languages are supported). It can effectively find relevant context in massive million-line projects like SQLite, Redis, and Git.

Does this possibly have non-coding-related utility for general reasoning about large volumes of text?

danenania 86 days ago [-]

The project map supports markdown files (and html), so you could definitely use it to explore docs, notes, etc. if they're all in markdown/html. Plaintext files aren't currently mapped though, so just the file name would be used to determine whether to load those.

iambateman 86 days ago [-]

Really cool! Looking forward to checking this out.

I really like my IDE (PHPStorm) but I want Cursor-like functionality, where it’s aware of my codebase and able to make changes iteratively. It sounds like this is what I need?

Excited to give this a go, thanks for sharing.

Btw one of the videos is private.

danenania 86 days ago [-]

Thanks! I'd love to hear your feedback.

> I want Cursor-like functionality, where it’s aware of my codebase and able to make changes iteratively. It sounds like this is what I need?

Yes, Plandex uses a tree-sitter project map to identify relevant context, then makes a detailed plan, then implements each step in the plan.

> Btw one of the videos is private.

Oops, which video did you mean? Just checked them all on the README and website in incognito mode and they all seem to be working for me.

iambateman 86 days ago [-]

The tutorial video : https://www.youtube.com/watch?v=VCegxOCAPq0

danenania 86 days ago [-]

Oh I see, in the HN post above. Sorry about that! Seems it's too late for me to edit, but here's the correct URL - https://youtu.be/g-_76U_nK0Y

I'll ping the mods to see if they can edit it.

ako 86 days ago [-]

Interesting to see that even with these type of tools coding it takes 8 months. That is not the general impression people have of ai assisted coding. Any thoughts on how you could improve plandex to bring down 8 months to 1 month or less?

danenania 86 days ago [-]

Another way to think about it is the 8 months of work I did would have taken years without help from AI tools (including Plandex itself).

elliot07 86 days ago [-]

Congrats on the V2 launch. Does Plandex support MCP? Will take it for a test drive tonight.

danenania 86 days ago [-]

Thanks! It doesn't support MCP yet, but it has some MCP-like features built-in. For example, it can launch a browser, pull in console logs or errors, and send them to model for debugging (either step-by-step or fully automated).

ErikBjare 86 days ago [-]

Awesome to see you're still at it. v2 looks great, I will take it for a spin.

danenania 86 days ago [-]

Thanks Erik! I'd love to hear your thoughts.

greggh 83 days ago [-]

Have you tested any local models through ollama? Did any work good enough to recommend?

mertleee 86 days ago [-]

CLI is the worst possible interface for coding llms. Especially for "larger" projects.

danenania 86 days ago [-]

There are pros and cons to different interfaces for sure. Personally, I'd want to have a CLI-based codegen tool in my toolkit even if I hadn't created Plandex, as there are benefits (environment configuration, execution control, file management, piping data, to name a few) that you can't easily get outside of a CLI.

I also personally find the IDE unwieldy for reviewing large diffs (e.g. dozens of files). I much prefer a vertically-scrolling side-by-side comparison view like GitHub's PR review UI (which Plandex offers).

shotgun 86 days ago [-]

Are you saying GUI IDEs are best? Or is there an ideal kind of interface we haven't yet seen?

esafak 86 days ago [-]

I think you should have put the "terminal-based" qualifier in the title and lede.

danenania 86 days ago [-]

Yeah that's fair enough. The way I look at it though, a lot of the value of Plandex is in the underlying infrastructure. While I've stuck with the terminal so far in order to try to keep a narrower focus, I plan to add other clients in the future.

lsllc 86 days ago [-]

You should talk to the folks at Warp, this plus Warp would be pretty interesting.

lsllc 86 days ago [-]

The link in the README.md to "local-mode quickstart" seems broken.

danenania 86 days ago [-]

Do you mean at the top of the README or somewhere else? The link at the top seems to be working for me.

lsllc 86 days ago [-]

In the table showing the hosting options lower down in the README, the 3rd row is titled "Self-hosted/Local Mode", but the link in "Follow the *local-mode quickstart* to get started." goes to a GH 404 page.

danenania 86 days ago [-]

I see—fixed it, thanks!

handfuloflight 86 days ago [-]

How efficient is it in constructing context?

danenania 86 days ago [-]

It constructs context to maximize cacheability. The context window is also carefully managed for efficiency and focus. During implementation steps, only the relevant files are loaded.

sreejithr 86 days ago [-]

we cooked?

curtisszmania 86 days ago [-]

[dead]

longmabacgi4 86 days ago [-]

[flagged]

Rendered at 08:44:46 GMT+0000 (Coordinated Universal Time) with Vercel.