NHacker Next
  • new
  • past
  • show
  • ask
  • show
  • jobs
  • submit
Gemini 2.5 Flash (developers.googleblog.com)
mmaunder 37 minutes ago [-]
More great innovation from Google. OpenAI have two major problems.

The first is Google's vertically integrated chip pipeline and deep supply chain and operational knowledge when it comes to creating AI chips and putting them into production. They have a massive cost advantage at every step. This translates into more free services, cheaper paid services, more capabilities due to more affordable compute, and far more growth.

Second problem is data starvation and the unfair advantage that social media has when it comes to a source of continually refreshed knowledge. Now that the foundational model providers have churned through the common crawl and are competing to consume things like video and whatever is left, new data is becoming increasingly valuable as a differentiator, and more importantly, as a provider of sustained value for years to come.

SamA has signaled both of these problems when he made noises about building a fab a while back and is more recently making noises about launching a social media platform off OpenAI. The smart money among his investors know these issues to be fundamental in deciding if OAI will succeed or not, and are asking the hard questions.

If the only answer for both is "we'll build it from scratch", OpenAI is in very big trouble. And it seems that that is the best answer that SamA can come up with. I continue to believe that OpenAI will be the Netscape of the AI revolution.

The win is Google's for the taking, if they can get out of their own way.

throwup238 22 minutes ago [-]
Nobody has really talked about what I think is an advantage just as powerful as the custom chips: Google Books. They already won a landmark fair use lawsuit against book publishers, digitized more books than anyone on earth, and used their Captcha service to crowdsource its OCR. They've got the best* legal cover and all of the best sources of human knowledge already there. Then Youtube for video.

The chips of course push them over the top. I don't know how much Deep Research is costing them but it's by far the best experience with AI I've had so far with a generous 20/day rate limit. At this point I must be using up at least 5-10 compute hours a day. Until about a week ago I had almost completely written off Google.

* For what it's worth, I don't know. IANAL

paxys 3 minutes ago [-]
LibGen already exists, and all the top LLM publishers use it. I don't know if Google's own book index provides a big technical or legal advantage.
dynm 13 minutes ago [-]
The amount of text in books is surprisingly finite. My best estimate was that there are ~10¹³ tokens available in all books (https://dynomight.net/scaling/#scaling-data), which is less than frontier models are already being trained on. On the other hand, book tokens are probably much "better" than random internet tokens. Wikipedia for example seems to get much higher weight than other sources, and it's only ~3×10¹⁰ tokens.
Keyframe 28 minutes ago [-]
Google has the data and has the hardware, not to mention software and infrastructure talent. Once this Bismarck turns around and it looks like it is, who can parry it for real? They have internet.zip and all the previous versions as well, they have youtube, email, search, books, traffic, maps and business on it, phones and habits around it, even the OG social network, the usenet. It's a sleeping giant starting to wake up and it's already causing commotion, let's see what it does when it drinks morning coffee.
kriro 16 minutes ago [-]
Agreed. One of Google's big advantages is the data access and integrations. They are also positioned really well for the "AI as entertainment" sector with youtube which will be huge (imo). They also have the knowledge in adtech and well injecting adds into AI is an obvious play. As is harvesting AI chat data.

Meta and Google are the long term players to watch as Meta also has similar access (Insta, FB, WhatsApp).

eastbound 9 minutes ago [-]
They have the Excel spreadsheets of all startups and businesses of the world (well 50/50 with Microsoft).

And Atlassian has all the project data.

whyenot 27 minutes ago [-]
Another advantage that Google has is the deep integration of Gemini into Google Office products and Gmail. I was part of a pilot group and got to use a pre-release version and it's really powerful and not something that will be easy for OpenAI to match.
mmaunder 22 minutes ago [-]
Agreed. Once they dial in the training for sheets it's going to be incredible. I'm already using notebooklm to upload finance PDFs, then having it generate tabular data and copypasta into sheets, but it's a garage solution compared to just telling it to create or update a sheet with parsed data from other sheets, PDFs, docs, etc.

And as far as gmail goes, I periodically try to ask it to unsubscribe from everything marketing related, and not from my own company, but it's not even close to being there. I think there will continue to be a gap in the market for more aggressive email integration with AI, given how useless email has become. I know A16Z has invested in a startup working on this. I doubt Gmail will integrate as deep as is possible, so the opportunity will remain.

Workaccount2 9 minutes ago [-]
I frankly am in doubt of future office products. In the last month I have ditched two separate excel productivity templates in favor of bespoke wrappers on sqlite databases, written by Claude and Gemini. Easier to use and probably 10x as fast.

You don't need a 50 function swiss army knife when your pocket can just generate the exact tool you need.

stefan_ 13 minutes ago [-]
I don't know man, for months now people keep telling me on HN how "Google is winning", yet no normal person I ever asked knows what the fuck "Gemini" is. I don't know what they are winning, it might be internet points for all I know.

Actually, some of the people polled recalled the Google AI efforts by their expert system recommending glue on pizza and smoking in pregnancy. It's a big joke.

mmaunder 5 minutes ago [-]
Try uploading a bunch of PDF bank statements to notebooklm and ask it questions. Or the results of blood work. It's jaw dropping. e.g. uploaded 7 brokerage account statements as PDFs in a mess of formats and asked it to generate table summary data which it nailed, and then asked it to generate actual trades to go from current position to a new position in shortest path, and it nailed that too.

Biggest issue we have when using notebooklm is a lack of ambition when it comes to the questions we're asking. And the pro version supports up to 300 documements.

Hell, we uploaded the entire Euro Cyber Resilience Act and asked the same questions we were going to ask our big name legal firm, and it nailed every one.

But you actually make a fair point, which I'm seeing too and I find quite exciting. And it's that even among my early adopter and technology minded friends, adoption of the most powerful AI tools is very low. e.g. many of them don't even know that notebookLM exists. My interpretation on this is that it's VERY early days, which is suuuuuper exciting for us builders and innovators here on HN.

kube-system 5 minutes ago [-]
While there are some first-party B2C applications like chat front-ends built using LLMs, once mature, the end game is almost certainly that these are going to be B2B products integrated into other things. The future here goes a lot further than ChatGPT.
zoogeny 24 minutes ago [-]
If the battle was between Altman and Pichai I'd have my doubts.

But the battle is between Altman and Hassabis.

I recall some advice on investment from Buffett regarding how he invests in the management team.

mdp2021 9 minutes ago [-]
Could you please expand, on both your points?
peterjliu 19 minutes ago [-]
another advantage is people want the Google bot to crawl their pages, unlike most AI companies
mmaunder 13 minutes ago [-]
This is an underrated comment. Yes it's a big advantage and probably a measurable pain point for Anthropic and OpenAI. In fact you could just do a 1% survey of robots.txt out there and get a reasonable picture. Maybe a fun project for an HN'er.
jiocrag 12 minutes ago [-]
Excellent point. If they can figure out how to either remunerate or drive traffic to third parties in conjunction with this, it would be huge.
jbverschoor 35 minutes ago [-]
Except that they train their model even when you pay. So yeah.. I'd rather not use their "evil"
zoogeny 12 minutes ago [-]
Google making Gemini 2.5 Pro (Experimental) free was a big deal. I haven't tried the more expensive OpenAI models so I can't even compare, only to the free models I have used of theirs in the past.

Gemini 2.5 Pro is so much of a step up (IME) that I've become sold on Google's models in general. It not only is smarter than me on most of the subjects I engage with it, it also isn't completely obsequious. The model pushes back on me rather than contorting itself to find a way to agree.

100% of my casual AI usage is now in Gemini and I look forward to asking it questions on deep topics because it consistently provides me with insight. I am building new tools with the mind to optimize my usage to increase it's value to me.

arnaudsm 2 hours ago [-]
Gemini flash models have the least hype, but in my experience in production have the best bang for the buck and multimodal tooling.

Google is silently winning the AI race.

statements 1 hours ago [-]
Absolutely agree. Granted, it is task dependent. But when it comes to classification and attribute extraction, I've been using 2.0 Flash with huge access across massive datasets. It would not be even viable cost wise with other models.
spruce_tips 2 hours ago [-]
i have a high volume task i wrote an eval for and was pleasantly surprised at 2.0 flash's cost to value ratio especially compared to gpt4.1-mini/nano

accuracy | input price | output price

Gemini Flash 2.0 Lite: 67% | $0.075 | $0.30

Gemini Flash 2.0: 93% | $0.10 | $0.40

GPT-4.1-mini: 93% | $0.40 | $1.60

GPT-4.1-nano: 43% | $0.10 | $0.40

excited to to try out 2.5 flash

jay_kyburz 32 minutes ago [-]
Can I ask a serious question. What task are you writing where its ok to get 7% error rate. I can't get my head around how this can be used.
omneity 26 minutes ago [-]
In my case, I have workloads like this where it’s possible to verify the correctness of the result after inference, so any success rate is better than 0 as it’s possible to identify the “good ones”.
spruce_tips 28 minutes ago [-]
low stakes text classification but it's something that needs to be done and couldnt be done in reasonable time frames or at reasonable price points by humans
dist-epoch 14 minutes ago [-]
Not OP, but for stuff like social networks spam/manipulation 7% error rate is fine
ghurtado 34 minutes ago [-]
I know it's a single data point, but yesterday I showed it a diagram of my fairly complex micropython program, (including RP2 specific features, DMA and PIO) and it was able to describe in detail not just the structure of the program, but also exactly what it does and how it does it. This is before seeing a single like of code, just going by boxes and arrows.

The other AIs I have shown the same diagram to, have all struggled to make sense of it.

Layvier 2 hours ago [-]
Absolutely. So many use cases for it, and it's so cheap/fast/reliable
SparkyMcUnicorn 1 hours ago [-]
And stellar OCR performance. Flash 2.0 is cheaper and more accurate than AWS Textract, Google Document AI, etc.

Not only in benchmarks[0], but in my own production usage.

[0] https://getomni.ai/ocr-benchmark

danielbln 1 hours ago [-]
I want to use these almost too cheap to meter models like Flash more, what are some interesting use cases for those?
42lux 2 hours ago [-]
The API is free, and it's great for everyday tasks. So yes there is no better bang for the buck.
drusepth 1 hours ago [-]
Wait, the API is free? I thought you had to use their web interface for it to be free. How do you use the API for free?
dcre 1 hours ago [-]
You can get an API key and they don't bill you. Free tier rate limits for some models (even decent ones like Gemini 2.0 Flash) are quite high.

https://ai.google.dev/gemini-api/docs/pricing

https://ai.google.dev/gemini-api/docs/rate-limits#free-tier

NoahZuniga 14 minutes ago [-]
The rate limits I've encountered with free api keys has been way lower than the limits advertised.
midasz 1 hours ago [-]
I use Gemini 2.5 pro experimental via openrouter in my openwebui for free. Was using sonnet 3.7 but I don't notice much difference so just default to the free thing now.
spruce_tips 1 hours ago [-]
create an api key and dont set up billing. pretty low rate limits and they use your data
1 hours ago [-]
mlboss 1 hours ago [-]
using aistudio.google.com
xnx 1 hours ago [-]
Shhhh. You're going to give away the secret weapon!
GaggiX 55 minutes ago [-]
Flash models are really good even for an end user because how fast and good performance they have.
rvz 1 hours ago [-]
Google always has been winning the AI race as soon as DeepMind was properly put to use to develop their AI models, instead of the ones that built Bard (Google AI team).
belter 2 hours ago [-]
> Google is silently winning the AI race.

That is what we keep hearing here...The last Gemini I cancelled the account, and can't help notice the new one they are offering for free...

arnaudsm 2 hours ago [-]
Sorry I was talking of B2B APIs for my YC startup. Gemini is still far behind for consumers indeed.
JeremyNT 30 minutes ago [-]
I use Gemini almost exclusively as a normal user. What am I missing out on that they are far behind on?

It seems shockingly good and I've watched it get much better up to 2.5 Pro.

arnaudsm 11 minutes ago [-]
Mostly brand recognition and the earlier Geminis had more refusals.

As a consumer, I also really miss the Advanced voice mode of ChatGPT, which is the most transformative tech in my daily life. It's the only frontier model with true audio-to-audio.

gambiting 1 hours ago [-]
In my experience they are as dumb as a bag of bricks. The other day I asked "can you edit a picture if I upload one"

And it replied "sure, here is a picture of a photo editing prompt:"

https://g.co/gemini/share/5e298e7d7613

It's like "baby's first AI". The only good thing about it is that it's free.

JFingleton 51 minutes ago [-]
Prompt engineering is a thing.

Learning how to "speak llm" will give you great results. There's loads of online resources that will teach you. Think of it like learning a new API.

ghurtado 29 minutes ago [-]
> in my experience they are as dumb as a bag of bricks

In my experience, anyone that describes LLMs using terms of actual human intelligence is bound to struggle using the tool.

Sometimes I wonder if these people enjoy feeling "smarter" when the LLM fails to give them what they want.

mdp2021 17 seconds ago [-]
If those people are a subset of those who demand actual intelligence, they will very often feel frustrated.
nowittyusername 28 minutes ago [-]
Its because google hasn't realized the value of training the model on information about its own capabilities and metadata. My biggest pet peeve about google and the way they train these models.
Fairburn 2 hours ago [-]
Sorry, but no. Gemini isn't the fastest horse, yet. And it's use within their ecosystem means it isn't geared to the masses outside of their bubble. They are not leading the race but they are a contender.
minimaxir 6 minutes ago [-]
One hidden note from Gemini 2.5 Flash when diving deep into the documentation: for image inputs, not only can the model be instructed to generated 2D bounding boxes of relevant subjects, but it can also create segmentation masks! https://ai.google.dev/gemini-api/docs/image-understanding#se...

At this price point with the Flash model, creating segmentation masks is pretty nifty.

The segmentation masks are a bit of a galaxy brain implementation by generating a b64 string representing the mask: https://colab.research.google.com/github/google-gemini/cookb...

I am trying to test it in AI Studio but it sometimes errors out, likely because it tries to decode the b64 lol.

xbmcuser 1 hours ago [-]
For a non programmer like me google is becoming shockingly good. It is giving working code the first time. I was playing around with it asked it to write code to scrape some data of a website to analyse. I was expecting it to write something that would scrape the data and later I would upload the data to it to analyse. But it actually wrote code that scraped and analysed the data. It was basic categorizing and counting of the data but I was not expecting it to do that.
kccqzy 1 hours ago [-]
That's the opposite experience of my wife who's in tech but also a non programmer. She wanted to ask Gemini to write code to do some basic data analysis things in a more automated way than Excel. More than once, Gemini wrote a long bash script where some sed invocations are just plain wrong. More than once I've had to debug Gemini-written bash scripts. As a programmer I knew how bash scripts aren't great for readability so I told my wife to ask Gemini to write Python. It resulted in higher code quality, but still contained bugs that are impossible for a non programmer to fix. Sometimes asking a follow up about the bugs would cause Gemini to fix it, but doing so repeatedly will result in Gemini forgetting what's being asked or simply throwing an internal error.

Currently IMO you have to be a programmer to use Gemini to write programs effectively.

drob518 45 minutes ago [-]
IMO, the only thing that’s consistent about AIs is how inconsistent they are. Sometimes, I ask them to write code and I’m shocked at how well it works. Other times, I feel like I’m trying to explain to a 5-year-old Alzheimer’s patient what I want and it just can’t seem to do the simplest stuff. And it’s the same AI in both cases.
sbarre 1 hours ago [-]
I've found that good prompting isn't just about asking for results but also giving hints/advice/direction on how to go about the work.

I suspect that if Gemini is giving you bash scripts it's because you're note giving it enough direction. As you pointed out, telling it to use Python, or giving it more expectations about how to go about the work or how the output should be, will give better results.

When I am prompting for technical or data-driven work, I tend to almost walk through what I imagine the process would be, including steps, tools, etc...

xbmcuser 60 minutes ago [-]
I had similar experiences few months back that is why I am saying it is becoming shockingly good the 2.5 is a lot better than the 2.0 version. Another thing I have realized just like google search in the past your query has a lot to do with the results you get. So an example of what you want works at getting better results
ac29 47 minutes ago [-]
> I am saying it is becoming shockingly good the 2.5 is a lot better than the 2.0 version

Are you specifically talking about 2.5 Flash? It only came out an hour ago, I dont know how you would have enough experience with it already to come to your conclusion.

(I am very impressed with 2.5 Pro, but that is a different model that's been available for several weeks now)

xbmcuser 31 minutes ago [-]
I am talking about 2.5 Pro
SweetSoftPillow 44 minutes ago [-]
It must have something to do with the way your wife is prompting. I've noticed this with my friends too. I usually get working code from Gemini 2.5 Pro on the first try, and with a couple of follow-up prompts, it often improves significantly, while my friends seem to struggle communicating their ideas to the AI and get worse results.

Good news: Prompting is a skill you can develop.

gregorygoc 10 minutes ago [-]
Is there a website with off the shelf prompts that work?
halfmatthalfcat 38 minutes ago [-]
Or we can just learn to write it ourselves in the same amount of time /shrug
999900000999 54 minutes ago [-]
Let's hope that's the case for a while.

I want to be able to just tell chat GPT or whatever to create a full project for me, but I know the moment it can do that without any human intervention, I won't be able to find a job.

Workaccount2 32 minutes ago [-]
There is definitely an art to doing it, but the ability is definitely there even if you don't know the language at all.

I have a few programs now that are written in Python (2 by 3.7, one by 2.5) used for business daily, and I can tell you I didn't, and frankly couldn't, check a single line of code. One of them is ~500 LOC, the other two are 2200-2700 LOC.

ant6n 34 minutes ago [-]
Last time I tried Gemini, it messed with my google photo data plan and family sharing. I wish I could try the AI separate from my Google account.
serjester 14 minutes ago [-]
Just ran it on one of our internal PDF (3 pages, medium difficulty) to json benchmarks:

gemini-flash-2.0: 60 ish% accuracy 6,250 pages per dollar

gemini-2.5-flash-preview (no thinking): 80 ish% accuracy 1,700 pages per dollar

gemini-2.5-flash-preview (with thinking): 80 ish% accuracy (not sure what's going on here) 350 pages per dollar

gemini-flash-2.5: 90 ish% accuracy 150 pages per dollar

I do wish they separated the thinking variant from the regular one - it's incredibly confusing when a model parameter dramatically impacts pricing.

ValveFan6969 12 minutes ago [-]
I have been having similar performance issues, I believe they intentionally made a worse model (Gemini 2.5) to get more money out of you. However, there is a way where you can make money off of Gemini 2.5.

If you set the thinking parameter lower and lower, you can make the model spew absolute nonsense for the first response. It costs 10 cents per input / output, and sometimes you get a response that was just so bad your clients will ask for more and more corrections.

xnx 2 hours ago [-]
50% price increase from Gemini 2.0 Flash. That sounds like a lot, but Flash is still so cheap when compared to other models of this (or lesser) quality. https://developers.googleblog.com/en/start-building-with-gem...
akudha 2 hours ago [-]
Is this cheaper than DeepSeek? Am I reading this right?
swyx 1 hours ago [-]
done pretty much inline with the price elo pareto frontier https://x.com/swyx/status/1912959140743586206/photo/1
xnx 59 minutes ago [-]
Love that chart! Am I imagining that I saw a version of that somewhere that even showed how the boundary has moved out over time?
Tiberium 2 hours ago [-]
del
Havoc 1 hours ago [-]
You may want to consult Gemini on those percentage calcs .10 to .15 is not 25%
alecco 1 hours ago [-]
Gemini models are very good but in my experience they tend to overdo the problems. When I give it things for context and something to rework, Gemini often reworks the problem.

For software it is barely useful because you want small commits for specific fixes not a whole refactor/rewrite. I tried many prompts but it's hard. Even when I give it function signatures of the APIs the code I want to fix uses, Gemini rewrites the API functions.

If anybody knows a prompt hack to avoid this, I'm all ears. Meanwhile I'm staying with Claude Pro.

byearthithatius 41 minutes ago [-]
Yes, it will add INSANE amounts of "robust error handling" to quick scripts where I can be confident about assumptions. This turns my clean 40 lines of Python where I KNOW the JSONL I am parsing is valid into 200+ lines filled with ten new try except statements. Even when I tell it not to do this, it loves to "find and help" in other ways. Quite annoying. But overall it is pretty dang good. It even spotted a bug I missed the other day in a big 400+ line complex data processing file.
zhengyi13 6 minutes ago [-]
I wonder how much of that sort of thing is driven by having trained their models on their own internal codebases? Because if that's the case, careful and defensive being the default would be unsurprising.
byefruit 2 hours ago [-]
It's interesting that there's a price nearly 6x price difference between reasoning and no reasoning.

This implies it's not a hybrid model that can just skip reasoning steps if requested.

Anyone know what else they might be doing?

Reasoning means contexts will be longer (for thinking tokens) and there's an increase in cost to inference with a longer context but it's not going to be 6x.

Or is it just market pricing?

vineyardmike 2 hours ago [-]
Based on their graph, it does look explicitly priced along their “Pareto Frontier” curve. I’m guessing that is guiding the price more than their underlying costs.

It’s smart because it gives them room to drop prices later and compete once other company actually get to a similar quality.

jsnell 1 hours ago [-]
> This implies it's not a hybrid model that can just skip reasoning steps if requested.

It clearly is, since most of the post is dedicated to the tunability (both manual and automatic) of the reasoning budget.

I don't know what they're doing with this pricing, and the blog post does not do a good job explaining.

Could it be that they're not counting thinking tokens as output tokens (since you don't get access to the full thinking trace anyway), and this is the basically amortizing the thinking tokens spend over the actual output tokens? Doesn't make sense either, because then the user has no incentive to use anything except 0/max thinking budgets.

RobinL 48 minutes ago [-]
Does anyone know how this pricing works? Supposing I have a classification prompt where I need the response to be a binary yes/no. I need one token of output, but reasoning will obviously add far more than 6 additional tokens. Is it still a 6x price multiplier? That doesn't seem to make sense, but not does paying 6x more for every token including reasoning ones
1 hours ago [-]
statements 2 hours ago [-]
Interesting to note that this might be the only model with knowledge cut off as recent as 2025 January
Tiberium 2 hours ago [-]
Gemini 2.5 Pro has the same knowledge cutoff specified, but in reality on more niche topics it's still limited to ~middle of 2024.
brightball 2 hours ago [-]
Isn't Grok 3 basically real time now?
jiocrag 9 minutes ago [-]
Not at all. The model weights and training data remain the same, it's just RAG'ing real-time twitter data into its context window when returning results. It's like a worse version of Perplexity.
bearjaws 60 minutes ago [-]
No LLM is real time, and in fact, even a 2025 cut off isn't entirely realistic. Without guidance to say, a new version of a framework it will frequently "reference" documentation from old versions and use that.

It's somewhat real time when it searches the web, of course that data is getting populated into context rather than in training.

Tiberium 1 hours ago [-]
That's the web version (which has tools like search plugged in), other models in their official frontends (Gemini on gemini.google.com, GPT/o models on chatgpt.com) are also "real time". But when served over API, most of those models are just static.
Workaccount2 2 hours ago [-]
OpenAI might win the college students but it looks like Google will lock in enterprise.
gundmc 2 hours ago [-]
Funny you should say that. Google just announced today that they are giving all college students one year of free Gemini advanced. I wonder how much that will actually move the needle among the youth.
Workaccount2 2 hours ago [-]
My guess is that they will use it and still call it "ChatGPT"...
tantalor 41 minutes ago [-]
Pass the Kleenex. Can I get a Band-Aid? Here's a Sharpie. I need a Chapstick. Let me Xerox that. Toss me that Frisbee.
drob518 37 minutes ago [-]
Exactly.
xnx 1 hours ago [-]
Chat Gemini Pretrained Transformer
anovick 13 minutes ago [-]
* Only in the U.S.
drob518 37 minutes ago [-]
And every professor just groaned at the thought of having to read yet another AI-generated term paper.
jay_kyburz 21 minutes ago [-]
They should just get AI to mark them. I genuinely think this is one thing AI would do better than humans.
xnx 2 hours ago [-]
ChatGPT seems to have a name recognition / first-mover advantage with college students now, but is there any reason to think that will stick when today's high school students are using Gemini on their Chromebooks?
edaemon 29 minutes ago [-]
It seems more and more like AI is less of a product and more of a feature. Most people aren't going to care or even know about the model or the company who made it, they're just going to use the AI features built into the products they already use.
asadm 1 hours ago [-]
funny thing about younglings, they will migrate to something else as fast as they came to you.
drob518 36 minutes ago [-]
I read about that on Facebook.
Oras 1 hours ago [-]
Enterprise has already been won by Microsoft (Azure), which runs on OpenAI.
r00fus 21 minutes ago [-]
That isn't what I'm seeing with my clientele (lots of startups and mature non-tech companies). Most are using Azure but very few have started to engage AI outside the periphery.
superfrank 2 hours ago [-]
Is there really lock in with AI models?

I built a product that uses and LLM and I got curious about the quality of the output from different models. It took me a weekend to go from just using OpenAI's API to having Gemini, Claude, and DeepSeek all as options and a lot of that time was research on what model from each provider that I wanted to use.

drob518 32 minutes ago [-]
There isn’t much of a lock-in, and that’s part of the problem the industry is going to face. Everyone is spending gobs of money on training and if someone else creates a better one next week, the users can just swap it right in. We’re going to have another tech crash for AI companies, similar to what happened in 2001 for .coms. Some will be winners but they won’t all be.
pydry 1 hours ago [-]
For enterprise practically any SaaS gets used as one more thing to lock them into a platform they already have a relationship with (either AWS, GCP or Azure).

It's actually pretty dangerous for the industry to have this much vertical integration. Tech could end up like the car industry.

superfrank 1 hours ago [-]
I'm aware of that. I'm an EM for a large tech company that sells multiple enterprise SaaS product.

You're right that the lock in happens because of relationships, but most big enterprise SaaS companies have relationships with multiple vendors. My company relationships with AWS, Azure, and GCP and we're currently using products from all of them in different products. Even on my specific product we're using all three.

When you've already got those relationships, the lock in is more about switching costs. The time it takes to switch, the knowledge needed to train people internally on the differences after the switch, and the actual cost of the new service vs the old one.

With AI models the time to switch from OpenAI to Gemini is negligible and there's little retraining needed. If the Google models (now or in the future) are comparable in price and do a better job than OpenAI models, I don't see where the lock in is coming from.

ein0p 1 hours ago [-]
How will it lock in the enterprise if its market share of enterprise customers is half that of Azure (Azure also sells OpenAI inference, btw), and one third that of AWS?
kccqzy 1 hours ago [-]
The same reason why people enjoy BigQuery enough that their only use of GCP is BigQuery while they put their general compute spend on AWS.

In other words, I believe talking about cloud market share as a whole is misleading. One cloud could have one product that's so compelling that people use that one product even when they use other clouds for more commoditized products.

AbuAssar 38 minutes ago [-]
I noticed that OpenAI don't compare their models to third party models in their announcement posts, unlike google, meta and the others.
ks2048 1 hours ago [-]
If this announcement is targeting people not up-to-date on the models available, I think they should say what "flash" means. Is there a "Gemini (non-flash)"?

I see the 4 Google model names in the chart here. Are these 4 the main "families" of models to choose from?

- Gemini-Pro-Preview

- Gemini-Flash-Preview

- Gemini-Flash

- Gemini-Flash-Lite

mwest217 48 minutes ago [-]
Gemini has had 4 families of models, in order of decreasing size:

- Ultra

- Pro

- Flash

- Flash-Lite

Versions with `-Preview` at the end haven't had their "official release" and are technically in some form of "early access" (though I'm not totally clear on exactly what that means given that they're fully available and as of 2.5 Pro Preview, have pricing attached to them - earlier versions were free during Preview but had pretty strict rate limiting but now it seems that Preview models are more or less fully usable).

jsnell 6 minutes ago [-]
The free-with-small-rate-limits designator was "experimental", not "preview".

I think the distinction between preview and full release is that the preview models have no guarantees on how long they'll be available, the full release comes with a pre-set discontinuation date. So if want the stability for a production app, you wouldn't want to use a preview model.

drob518 29 minutes ago [-]
Is GMail still in beta?
mark_l_watson 35 minutes ago [-]
Nice! Low price, even with reasoning enabled. I have been working on a short new book titled “Practical AI with Google: A Solo Knowledge Worker's Guide to Gemini, AI Studio, and LLM APIs” but with all of Google’s recent announcements it might not be a short book.
hmaxwell 2 hours ago [-]
I did some testing this morning:

Prompt: "can you find any mistakes on my codebase? I put one in there on purpose" + 70,000 tokens of codebase where in one line I have an include for a non-existent file.

Claude 3.7 Sonnet (Try 1 - new chat) Claimed it Found a race condition in the api of the admin interface that would be triggered if two admins were to change the room order at the same time. Claude suggested I group all sql queries in a single transaction. I looked at the code and found that it already used a transaction for all queries. I said: the order_update api is already done with a transaction. Claude replied: "You're absolutely right, and I apologize for my mistake. I was incorrect to claim there was a race condition issue. The transaction ensures atomicity and consistency of the updates, and the SQL queries are properly structured for their intended purpose."

Claude 3.7 Sonnet (Try 2 - new chat) Found a comment in the admin ui javascript code that suggested a potential alternative to event handler cleanup that was not implemented because I decided to go with a cleaner route. Then asked "Is this the issue you intentionally included, or would you like me to look for other potential problems?" I said: "The comment merely suggests an alternative, right?" claude said: "Yes, you're absolutely right. The comment is merely suggesting an alternative approach that isn't being used in the code, rather than indicating a mistake. So there's no actual bug or mistake in this part of the code - just documentation of different possible approaches. I apologize for misinterpreting this as an issue!"

Claude 3.7 Sonnet (Try 3 - new chat) When processing items out of the database to generate QR codes in the admin interface, Claude says that my code both attempts to generate QR codes with undefined data AS WELL AS saying that my error handling skips undefined data. Claude contradicts itself within 2 sentences. When asking about clarification Claude replies: Looking at the code more carefully, I see that the code actually has proper error handling. I incorrectly stated that it "still attempts to call generateQRCode()" in the first part of my analysis, which was wrong. The code properly handles the case when there's no data-room attribute.

Gemnini Advanced 2.5 Pro (Try 1 - new chat) Found the intentional error and said I should stop putting db creds/api keys into the codebase.

Gemnini Advanced 2.5 Pro (Try 2 - new chat) Found the intentional error and said I should stop putting db creds/api keys into the codebase.

Gemnini Advanced 2.5 Pro (Try 3 - new chat) Found the intentional error and said I should stop putting db creds/api keys into the codebase.

o4-mini-high and o4-mini and o3 and 4.5 and 4o - "The message you submitted was too long, please reload the conversation and submit something shorter."

Tiberium 2 hours ago [-]
The thread is about 2.5 Flash though, not 2.5 Pro. Maybe you can try again with 2.5 Flash specifically? Even though it's a small model.
danielbln 1 hours ago [-]
Those responses are very Claude, to. 3.7 has powered our agentic workflows for weeks, but I've been using almost only Gemini for the last week and feel the output is better generally. It's gotten much better at agentic workflows (using 2.0 in an agent setup was not working well at all) and I prefer its tuning over Clause's, more to the point and less meandering.
rendang 1 hours ago [-]
3 different answers in 3 tries for Claude? Makes me curious how many times you'd get the same answer if you asked 10/20/100 times
airstrike 2 hours ago [-]
Have you tried Claude Code?
1 hours ago [-]
punkpeye 2 hours ago [-]
This is cool, but rate limits on all of these preview models are PITA
Layvier 2 hours ago [-]
Agreed, it's not even possible to run an eval dataset. If someone from google see this please at least increase the burst rate limit
punkpeye 1 hours ago [-]
It is not without rate limits, but we do have elevated limits for our accounts through:

https://glama.ai/models/gemini-2.5-flash-preview-04-17

So if you just want to run evals, that should do it.

Though the first couple of days after a model comes out are usually pretty rough because everyone try to run their evals.

Layvier 6 minutes ago [-]
That's very interesting, thanks for sharing!
punkpeye 1 hours ago [-]
What I am noticing with every new Gemini model that comes out is that the time to first token (TTFT) is not great. I guess it is because they gradually transfer computer power from old models to new models as the demand increases.
Filligree 52 minutes ago [-]
If you’re imagining that 2.5Pro gets dynamically loaded during the time to first token, then you’re vastly overestimating what’s physically possible.

It’s more likely a latency-throughput tradeoff. Your query might get put inside a large batch, for example.

2 hours ago [-]
__alexs 1 hours ago [-]
Does billing for the API actually work properly yet?
charcircuit 52 minutes ago [-]
500 RPD for the free tier is good enough for my coding needs. Nice.
cynicalpeace 56 minutes ago [-]
1. The main transformative aspect of LLMs has been in writing code.

2. LLMs have had less transformative aspects in 2025 than we anticipated back in late 2022.

3. LLMs are unlikely to be very transformative to society, even as their intelligence increases, because intelligence is a minor changemaker in society. Bigger changemakers are motivation, courage, desire, taste, power, sex and hunger.

4. LLMs are unlikely to develop these more important traits because they are trained on text, not evolved in a rigamarole of ecological challenges.

AStonesThrow 60 minutes ago [-]
I've been leveraging the services of 3 LLMs, mainly: Meta, Gemini, and Copilot.

It depends on what I'm asking. If I'm looking for answers in the realm of history or culture, religion, or I want something creative such as a cute limerick, or a song or dramatic script, I'll ask Copilot. Currently, Copilot has two modes: "Quick Answer"; or "Think Deeply", if you want to wait about 30 seconds for a good answer.

If I want info on a product, a business, an industry or a field of employment, or on education, technology, etc., I'll inquire of Gemini.

Both Copilot and Gemini have interactive voice conversation modes. Thankfully, they will also write a transcript of what we said. They also eagerly attempt to engage the user with further questions and followups, with open questions such as "so what's on your mind tonight?"

And if I want to know about pop stars, film actors, the social world or something related to tourism or recreation in general, I can ask Meta's AI through [Facebook] Messenger.

One thing I found to be extremely helpful and accurate was Gemini's tax advice. I mean, it was way better than human beings at the entry/poverty level. Commercial tax advisors, even when I'd paid for the Premium Deluxe Tax Software from the Biggest Name, they just went to Google stuff for me. I mean, they didn't even seem to know where stuff was on irs.gov. When I asked for a virtual or phone appointment, they were no-shows, with a litany of excuses. I visited 3 offices in person; the first two were closed, and the third one basically served Navajos living off the reservation.

So when I asked Gemini about tax information -- simple stuff like the terminology, definitions, categories of income, and things like that -- Gemini was perfectly capable of giving lucid answers. And citing its sources, so I could immediately go find the IRS.GOV publication and read it "from the horse's mouth".

Oftentimes I'll ask an LLM just to jog my memory or inform me of what specific terminology I should use. Like "Hey Gemini, what's the PDU for Ethernet called?" and when Gemini says it's a "frame" then I have that search term I can plug into Wikipedia for further research. Or, for an introduction or overview to topics I'm unfamiliar with.

LLMs are an important evolutionary step in the general-purpose "search engine" industry. One problem was, you see, that it was dangerous, annoying, or risky to go Googling around and click on all those tempting sites. Google knew this: the dot-com sites and all the SEO sites that surfaced to the top were traps, they were bait, they were sometimes legitimate scams. So the LLM providers are showing us that we can stay safe in a sandbox, without clicking external links, without coughing up information about our interests and setting cookies and revealing our IPv6 addresses: we can safely ask a local LLM, or an LLM in a trusted service provider, about whatever piques our fancy. And I am glad for this. I saw y'all complaining about how every search engine was worthless, and the Internet was clogged with blogspam, and there was no real information anymore. Well, perhaps LLMs, for now, are a safe space, a sandbox to play in, where I don't need to worry about drive-by-zero-click malware, or being inundated with Joomla ads, or popups. For now.

ein0p 2 hours ago [-]
Absolutely decimated on metrics by o4-mini, straight out of the gate, and not even that much cheaper on output tokens (o4-mini's thinking can't be turned off IIRC).
vessenes 1 hours ago [-]
o4-mini costs 8x as much as 2.5 flash. I believe its useful context window is also shorter, although I haven't verified this directly.
mccraveiro 1 hours ago [-]
2.5 flash with reasoning is just 20% cheaper than o4-mini
vessenes 34 minutes ago [-]
Good point: reasoning costs more. Also impossible to tell without tests is how verbose the reasoning mode is
rfw300 1 hours ago [-]
o4-mini does look to be a better model, but this is actually a lot cheaper! It's ~7x cheaper for both input and output tokens.
ein0p 45 minutes ago [-]
These small models only make sense with "thinking" enabled. And once you enable that, much of the cost advantage vanishes, for output tokens.
overfeed 33 minutes ago [-]
> These small models only make sense with "thinking" enabled

This entirely depends on your use-cases.

gundmc 2 hours ago [-]
It's good to see some actual competition on this price range! A lot of Flash 2.5's edge will depend on how well the dynamic reasoning works. It's also helpful to have _significantly_ lower input token cost for a large context use cases.
mupuff1234 1 hours ago [-]
Not sure "decimated" is a fitting word for "slightly higher performance on some benchmarks".
fwip 34 minutes ago [-]
Perhaps they were using the original meaning of "one-tenth destroyed." :P
transformi 2 hours ago [-]
Bad day is going on google.

First the decleration of illegal monopoly..

and now... Google’s latest innovation: programmable overthinking.

With Gemini 2.5 Flash, you too can now set a thinking_budget—because nothing says "state-of-the-art AI" like manually capping how long it’s allowed to reason. Truly the dream: debugging a production outage at 2am wondering if your LLM didn’t answer correctly because you cheaped out on tokens. lol.

“Turn thinking off for better performance.” That’s not a model config, that’s a metaphor for Google’s entire AI strategy lately.

At this point, Gemini isn’t an AI product—it’s a latency-cost-quality compromise simulator with a text interface. Meanwhile, OpenAI and Anthropic are out here just… cooking the benchmarks

danielbln 2 hours ago [-]
Google's Gemini 2.5 pro model is incredibly strong, it's en par and at times better than Claude 3.7 in coding performance, being able to ingest entire videos into the context is something I haven seen elsewhere either. Google AI products have been anywhere between bad (Bard) to lackluster (Gemini 1.5), but 2.5 is a contender, in all dimensions. Google is also the only player that owns the entire stack, from research, software , data, compute hardware. I think they were slow to start but they've closed the gap since.
bsmith 1 hours ago [-]
Using AI to debug code at 2am sounds like pure insanity.
Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact
Rendered at 21:14:36 GMT+0000 (Coordinated Universal Time) with Vercel.