Security

What AI Chatbots Actually Do With Your Data (And What You're Silently Agreeing To)

Someone's therapy session ended up in Google's search index last summer.

Not because they were hacked. Not because OpenAI had a breach. Because they clicked a Share button — the one ChatGPT had quietly added to conversations — without realizing that "sharing" could mean "publicly discoverable by any search engine on the planet."

By the time TechCrunch reported on it in July 2025, there were nearly 4,500 ChatGPT conversations indexed by Google. Mental health crises. Legal advice queries. A healthcare professional's sessions with identifying details intact. Employees venting about workplace situations with enough specifics that their employer could theoretically find them. OpenAI called it a "short-lived experiment" and quietly killed the feature. Which, okay, sure. But the problem wasn't really the feature.

The problem is that most people have no idea what happens to a conversation once they hit send.

It's a conversation. Not a contract.

You open ChatGPT, or Gemini, or Claude. You type something — maybe a work problem, maybe a health question, maybe something you haven't told anyone else. You hit enter. The AI responds. You feel like you just had a private exchange.

You didn't.

What you actually did was submit data to a company's servers. That data is stored. In most cases, it can be read by human employees at the company. In many cases, it feeds into future versions of the model — meaning the next iteration of the AI is, in part, shaped by what you shared with the current one. And depending on your settings, that data might stick around for anywhere from 30 days to five years.

I've written about the oversharing trap before — the way social media quietly turns casual disclosure into a permanent data profile. AI chat is the same trap running at a different speed. The interface feels intimate. The model sounds like it's listening. And so people share things they'd never type into a search box.

ChatGPT, Gemini, Claude: what each one actually does

Let me walk through each platform honestly — not to scare you off them, but because "I didn't know" isn't a great privacy strategy.

ChatGPT (OpenAI)

Free and Plus subscribers: your conversations may be used to train future models, including GPT-4o and GPT-5. OpenAI is upfront that human reviewers — both employees and contractors — can access conversations to fix bugs, fine-tune outputs, and investigate safety concerns. There's an opt-out. It's buried in Settings → Data Controls → "Improve the model for everyone." Most people have never seen it.

API users and enterprise accounts get treated differently — that data isn't used for training by default. Which tells you something: the people paying for privacy protection can get it. Everyone else is the product by default.

There's also a breach worth knowing about. In March 2023, an internal bug at OpenAI exposed ChatGPT Plus subscribers' conversation histories and payment information to other users. OpenAI patched it. What they didn't do was inform Italy's data protection authority — which is required under GDPR. That omission was one of the reasons Italy's Garante fined them €15 million in December 2024. First major GenAI privacy fine in the EU. Probably not the last.

Google Gemini

Gemini's situation has a wrinkle the others don't. Google's entire business runs on knowing things about you. So when Gemini says it uses "a sample" of free user conversations for training — which it now does, after expanding its policy in September 2025 to include documents, audio, and images — the context matters more than the policy language. This is a company that already has your search history, your email if you use Gmail, your location, your YouTube watch history. Gemini conversations aren't just conversation data. They're another layer on a profile that already knows you pretty well.

For free users, reviewed chats are retained for up to three years. Human reviewers assess whether responses were low quality or harmful. Workspace and enterprise users are protected by contract — Google commits not to use that data for training. But the majority of people using Gemini casually are on the free tier. Opt-out exists. It's not exactly front and center.

Claude (Anthropic)

This one matters because of what changed quietly in late 2025. For most of Claude's existence, training on user conversations was opt-in — you had to actively allow it. In August 2025, Anthropic announced they were flipping the default. Starting September 28, conversations from free, Pro, and Max accounts would be used for training unless users opted out before the deadline. If you missed it — and a lot of people did — your retention period just went from 30 days to five years.

Sound familiar? It's exactly the move I covered when Microsoft shifted how Office 365 handles your data. Change the default, post a blog announcement, see who notices. To Anthropic's credit, they gave a clear deadline and were transparent about what was changing. But "transparent" doesn't mean "hard to miss." The responsibility for catching it still landed on you.

(API users, enterprise accounts, and Bedrock aren't affected. Same pattern as the others — privacy protection scales with how much you're paying.)

What people are actually typing in there

The July 2025 Google indexing situation gave us a rare, uncomfortable look at what people share with AI chat when they think nobody's watching. Mental health crises. A healthcare professional's therapy conversations with enough identifying details to trace back to a real person. Employees describing internal conflicts. Business strategies. Source code from projects that haven't launched. Legal situations with names attached.

None of it was supposed to be public. Some of it ended up searchable.

Your digital footprint already contains more about you than most people realize — data you generated, data that was inferred, data bought and sold without your involvement. AI conversation logs are a new category on top of that. And unlike a search query, they're detailed. Specific. Descriptive. Written in your own words, describing your actual situation, in the context of whatever you were actually going through.

The asymmetry is what gets me: the AI generates a response and moves on. The company keeps what you wrote for months, sometimes years.

The legal side (briefly, because this isn't a law blog)

Italy's €15M fine on OpenAI was the EU's first real enforcement action against a generative AI company for privacy violations. The charges: processing personal data without a legal basis, failing to report the 2023 breach, and not verifying users' ages. OpenAI called it "disproportionate" — the fine was nearly 20 times their entire Italian revenue that year — and appealed. Whether they win or lose on appeal, the precedent is set.

In the US, the FTC's "Operation AI Comply" has been going after deceptive AI marketing. California's SB 243 mandates disclosure requirements for companion chatbots, effective January 2026. Class actions in California and Massachusetts are testing whether chatbots that share conversation data with third-party providers violate state wiretapping laws without consent. The broader 2026 AI privacy landscape is still taking shape, but the EU AI Act hits full enforcement for high-risk systems in August 2026.

The pressure is building. It's just building slowly. Don't wait for it to catch up.

What you can actually do

None of this requires swearing off AI tools. It just requires being deliberate about them.

Opt out of model training. All three platforms have the setting — it just takes 30 seconds to find it. ChatGPT: Settings → Data Controls → toggle off "Improve the model for everyone." Gemini: myaccount.google.com → Data & Privacy. Claude: Settings → Privacy Controls. Do it now, while you're thinking about it.

Never paste anything you wouldn't want retained. Passwords, SSNs, medical records, client data, unreleased work, source code — leave it out. If you need help with something sensitive, describe the situation in general terms.

Treat AI chat like email. Email can be subpoenaed, leaked, forwarded, and read by your provider. Approach AI conversations with the same assumption.

Use the API if privacy actually matters for a specific task. None of the three major platforms use API data for training by default. If you're technically comfortable, it's the cleanest option for sensitive work.

Read the "how we use your data" section of each privacy policy once. Just that section. It's usually a few hundred words. Knowing what a service collects before you use it is the same habit I'd recommend for any app — AI tools are no different. And if you want to understand how much of your information is already out there before you add AI conversations on top of it, start here.

OpenAI, Google, and Anthropic aren't running a scam. They're building genuinely useful technology, and they improve it by learning from how people actually use it. That's the trade: your conversations help make the model smarter; a smarter model helps you more. That's legitimate.

The issue isn't that the trade exists. It's that most people are making it without knowing they're making it at all — typing their medical questions, their legal problems, their half-formed thoughts into a box that feels private because it answers in a quiet voice and never judges.

It's not private. Knowing that doesn't mean you have to stop using it. It just means you should know.