Next.js App Router + React Server Components Demo

new
past
show
ask
show
jobs
submit

▲Show HN: Magnitude – Open-source AI browser automation framework (github.com)

48 points by anerli 7 hours ago | 18 comments

rozap 3 hours ago [-]

There are a number of these out there, and this one has a super easy setup and appears to Just Work, so nice job on that. I had it going and producing plausible results within a minute or so.

One thing I'm wondering is if there's anyone doing this at scale? The issue I see is that with complex workflows which take several dozen steps and have complex control flow, the probability of reaching the end falls off pretty hard, because if each step has a .95 chance of completing successfully, after not very many steps you have a pretty small overall probability of success. These use cases are high value because writing a traditional scraper is a huge pain, but we just don't seem to be there yet.

The other side of the coin is simple workflows, but those tend to be the workflows where writing a scraper is pretty trivial. This did work, and I told it to search for a product at a local store, but the program cost $1.05 to run. So doing it at any scale quickly becomes a little bit silly.

So I guess my question is: who is having luck using these tools, and what are you using them for?

One route I had some success with is writing a DSL for scraping and then having the llm generate that code, then interpreting it and editing it when it gets stuck. But then there's the "getting stuck detection" part which is hard etc etc.

anerli 3 hours ago [-]

Glad you were able to get it set up quickly!

We currently are optimizing for reliability and quality, which is why we suggest Claude - but it can get expensive in some cases. Using Qwen 2.5-VL-72B will be significantly cheaper, though may not be always reliable.

Most of our usage right now is for running test cases, and people seem to often prefer qwen for that use case - since typically test cases are clearer how to execute.

Something that is top of mind for is is figuring out a good way to "cache" workflows that get taken. This way you can repeat automations either with no LLM or with a smaller/cheap LLM. This will would enable deterministic, repeatable flows, that are also very affordable and fast. So even if each step on the first run is only 95% reliable - if it gets through it, it could repeat it with 100% reliability.

ewired 1 hours ago [-]

It was interesting to find out that Qwen 2.5 VL can output coordinates like Sonnet 4, or does that use a different implementation?

anerli 39 minutes ago [-]

Both of them are "visually grounded" - meaning if you ask for the location of something in an image - they can output the exact x/y pixel coordinates! Not many models can do this, especially not many that are large enough to actually reason through sequences of actions well

axlee 3 hours ago [-]

Using this for testing instead of regular playwright must 10000x the cost and speed, doesn't it? At which points do the benefits outweigh the costs?

anerli 3 hours ago [-]

I think depends a lot on how much you value your own time, since its quite time consuming to write and update playwright scripts. It's gonna save you developer hours to write automations using natural language rather than messing around with and fixing selectors. It's also able to handle tasks that playwright wouldn't be able to do at all - like extracting structured data from a messy/ambiguous DOM and adapting automatically to changing situations.

You can also use cheaper models depending on your needs, for example Qwen 2.5 VL 72B is pretty affordable and works pretty well for most situations.

plufz 2 hours ago [-]

But we can use an LLM to write that script though and give that agent access to a browser to find DOM selectors etc. And than we have a stable script where we, if needed, manually can fix any LLM bugs just once…? I’m sure there are use cases with messy selectors as you say, but for me it feels like most cases are better covered by generating scripts.

anerli 2 hours ago [-]

Yeah we've though about this approach a lot - but the problem is if your final program is a brittle script, you're gonna need a way to fix it again often - and then you're still depending on recurrently using LLMs/agents. So we think its better to have the program itself be resilient to change instead of you/your LLM assistant having to constantly ensure the program is working.

grbsh 6 hours ago [-]

Why not just use Claude by itself? Opus and Sonnet are great at producing pixel coordinates and tool usages from screenshots of UIs. Curious as to what your framework gives me over the plain base model.

anerli 6 hours ago [-]

Hey! To have a framework that can effectively control browser agents, you need systems to interact with the browser, but also pass relevant content from the page to the LLM. Our framework manages this agent loop in a way that enables flexible agentic execution that can mix with your own code - giving you control but in a convenient way. Claude and OpenAI computer use APIs/loops are slower, more expensive, and tailored for a limited set of desktop automation use cases rather than robust browser automations.

KeysToHeaven 6 hours ago [-]

Finally, a browser agent that doesn’t panic at the sight of a canvas

anerli 6 hours ago [-]

Exactly :)

revskill 5 hours ago [-]

Not sure about this because you're the author.

anerli 5 hours ago [-]

Try it out and report back!

revskill 4 hours ago [-]

legucy 4 hours ago [-]

Classic new age hacker news hostility. Do you think this response adds anything?

owebmaster 34 minutes ago [-]

I do, cheap praise doesn't benefit the community and it might be astroturf. Constructive criticism would be more valuable - there are multiple similar projects like this posted here daily, and this one likely isn't the best.

anerli 4 minutes ago [-]

For context, we have no affiliation with KeysToHeaven (though we appreciate his comment). We do think our vision-first approach gives us a significant edge over other browser agents, though we probably could’ve made that aspect clearer in the title

Abubaker761 6 hours ago [-]

[dead]

Rendered at 01:17:22 GMT+0000 (Coordinated Universal Time) with Vercel.