They are definitely capable of writing such statements, which you can see in their enterprise products. In my Google Workspace gemini app it says pretty prominently and clearly:
Your [ORGNAME] chats aren’t used to improve our models
So they definitely understand that people want to hear that their data isn't being used for training, and they know how to say it clearly and reassuringly. Which makes the omission of that in their consumer products more telling in my view.
says Google's datacenter water consumption in 2023 was 5.2 billion gallons, or ~14 million gallons a day. Microsoft was ~4.7, Facebook was 2.6, AWS didn't seem to disclose, Apple was 2.3. These numbers seem pulled from what the companies published.
The total for these companies was ~30 million gallons a day. Apply your best guesses as to what fraction of datacenter usage they are, what fraction of datacenter usage is AI, and what 2025 usage looks like compared to 2023. My guess is it's unlikely to come out to more than 120 million.
I didn't vet this that carefully so take the numbers with a grain of salt, but the rough comparison does seem to hold that Arizona golf courses are larger users of water.
Agricultural numbers are much higher, the California almond industry uses ~4000 million gallons of water a day.
I was also surprised when someone asked me about AI's water consumption because I had never heard of it being an issue. But a cursory search shows that datacenters use quite a bit more water than I realized, on the order of 1 liter of water per kWh of electricity. I see a lot of talk about how the hyperscalers are doing better than this and are trying to get to net-positive, but everything I saw was about quantifying and optimizing this number rather than debunking it as some sort of myth.
I find "1 liter per kWh" to be a bit hard to visualize, but when they talk about building a gigawatt datacenter, that's 278L/s. A typical showerhead is 0.16L/s. The Californian almond industry apparently uses roughly 200kL/s averaged over the entire year -- 278L/s is enough for about 4 square miles of almond orchards.
So it seems like a real thing but maybe not that drastic, especially since I think the hyperscaler numbers are better than this.
I've found a method that gives me a lot more clarity about a company's privacy policy:
1. Go to their enterprise site
2. See what privacy guarantees they advertise above the consumer product
3. Conclusion: those are things that you do not get in the consumer product
These companies do understand what privacy people want and how to write that in plain language, and they do that when they actually offer it (to their enterprise clients). You can diff this against what they say to their consumers to see where they are trying to find wiggle room ("finetuning" is not "training", "ever got free credits" means not-"is a paid account", etc)
For Code Assist, here's their enterprise-oriented page vs their consumer-oriented page:
I agree in general, but I think some important context here is that the author of this post was previously on the OpenAI board (the board that fired Sam Altman).
The worst part to me is the privacy nightmare with AI Studio. It's essentially impossible to tell whether any particular API call will end up being included in their training data since this depends on properties that are stored elsewhere and are not available to the developer -- even a simple property such as "does this account have billing enabled" is oddly difficult to evaluate, and I was told by their support that because I at one point had any free credits on my account that it was a trial account and not a billed account even though I had a credit card attached and was being charged. I don't know if this is true and there is no way for me to find out.
At some point they updated their privacy policy in regards to this, but instead of saying that this will cause them to train on your data, now the privacy policy says both that they will train on this data and that they will not train on this data, with no indication of which statement takes precedence over the other.
Unless you are in the free tier of the API we do not train on your data. But let us make it clearer in the policy. If you would like to get more clarity on terms please DM me at @shresbm on X
There are a few conditions that take precedence over having-billing-enabled and will cause AI Studio to train on your data. This is why I personally use Vertex
The benchmark numbers don't really mean anything -- Google says that Gemini 2.5 Pro has an AIME score of 86.7 which beats o3-mini's score of 86.5, but OpenAI's announcement post [1] said that o3-mini-high has a score of 87.3 which Gemini 2.5 would lose to. The chart says "All numbers are sourced from providers' self-reported numbers" but the only mention of o3-mini having a score of 86.5 I could find was from this other source [2]
It's "experimental", which means that it is not fully released. In particular, the "experimental" tag means that it is subject to a different privacy policy and that they reserve the right to train on your prompts.
2.0 Pro is also still "experimental" so I agree with GP that it's pretty odd that they are "releasing" the next version despite never having gotten to fully releasing the previous version.
So they definitely understand that people want to hear that their data isn't being used for training, and they know how to say it clearly and reassuringly. Which makes the omission of that in their consumer products more telling in my view.