Back to Privacy and Data Safety with AI Tools

Lesson 1 of 8

What AI tools do with your data: the full picture

~20 min read

What AI Tools Do With Your Data: The Full Picture

It's Tuesday morning. A consultant on your team pastes a client's draft acquisition proposal into ChatGPT to sharpen the executive summary. A marketing analyst uploads last quarter's sales figures to Claude to generate commentary for a board deck. Your HR manager asks Gemini to rewrite three employee performance reviews to sound more balanced. Each of these takes about 90 seconds and saves real time. None of these people gave a second thought to where that data goes next — who can see it, whether it's stored, whether it's used to train the next version of the model. That gap between how AI tools feel (private, instant, disposable) and how they actually work (networked, logged, sometimes retained) is exactly what this lesson closes.

The Invisible Journey of Your Prompt

When you type a prompt into ChatGPT or Claude, you're not whispering into a void. Your text leaves your device, travels encrypted over HTTPS to a remote server — OpenAI's infrastructure for ChatGPT, Anthropic's for Claude — where it's processed by a large language model running on GPU clusters. The model generates a response, which travels back to your screen. That round trip takes one to ten seconds depending on prompt complexity and server load. What's less visible is everything happening around that core transaction: your prompt may be logged, associated with your account, reviewed by human trainers (under certain conditions), and in consumer-tier products, potentially used to improve future model versions. The word 'may' is doing a lot of work in that sentence — the specifics depend entirely on which product you're using, which plan you're on, and whether you've changed the default settings.

Every major AI provider distinguishes between consumer products and API or enterprise products, and this distinction changes everything about data handling. ChatGPT's free and Plus tiers (the ones most employees use without company approval) default to using conversations for model training unless you manually opt out in settings. OpenAI's API, by contrast, does not use inputs or outputs to train models by default. Claude on Claude.ai has a similar split: consumer accounts can have their conversations reviewed for safety and quality; Claude accessed via Anthropic's API carries stronger data non-use guarantees. Gemini on Google's consumer interface is subject to Google's broader data practices. These aren't edge cases or fine print traps — they're the designed default states of products built for different audiences, and most professionals are using the consumer version while assuming enterprise-grade privacy.

Human review is the detail that surprises most professionals. OpenAI's usage policies state that a small subset of conversations may be reviewed by human trainers to improve model safety and quality. Anthropic has similar provisions. This doesn't mean someone at OpenAI is reading your specific prompt about your client's acquisition — the scale makes that statistically unlikely. But it means the contractual possibility exists, and that matters enormously in regulated industries. A law firm sharing privileged client communications, a healthcare analyst pasting patient data, or a finance team uploading non-public market information can create serious liability through what feels like a routine productivity habit. The risk isn't theoretical — it's a compliance exposure that legal and IT teams increasingly flag in audits.

Check Your Settings Before Your Next Prompt

In ChatGPT: go to Settings → Data Controls → toggle off 'Improve the model for everyone.' In Claude.ai: Settings → Privacy → disable conversation training. These opt-outs apply to your account only — they don't cover colleagues using their own accounts. Enterprise or Teams plans handle this at the organizational level, which is a much stronger control.

What 'Training Data' Actually Means for Your Business

When a provider uses your conversation for training, the process isn't a human copying your words into a database. It's more nuanced — and the nuance matters for how you assess the risk. Training typically involves using conversation examples to fine-tune model behavior: reinforcing helpful responses, reducing harmful ones, improving instruction-following. Your prompt about restructuring a client proposal doesn't become a retrievable fact that the model will quote back to someone else. That's not how LLMs work — they don't store and retrieve specific text the way a search engine indexes pages. Instead, patterns from many conversations subtly influence model weights over time. The practical risk is less about your exact words appearing in someone else's output and more about the principle: you've sent confidential business information to a third party's servers, and their terms of service permit certain uses of it. That's a data governance issue regardless of the technical mechanism.

Analyst — Safe Version of a Data Summary Request

Prompt

I'm an analyst preparing commentary for a board presentation. Here is a fictional dataset with the same structure as our real data: Region A: 12% growth, Region B: -3% growth, Region C: 8% growth Write three concise board-ready sentences interpreting this regional performance, flagging the underperforming region and recommending a follow-up action.

AI Response

Regional performance in the latest period shows strong momentum in Region A (+12%) and solid growth in Region C (+8%), while Region B declined 3% — the only market moving against trend. This divergence warrants immediate attention: Region B's underperformance may signal pricing pressure, distribution gaps, or competitive encroachment that the other regions have not yet experienced. The board should request a root-cause briefing from the Region B leadership team before the next planning cycle to determine whether intervention or resource reallocation is required.

The prompt example above demonstrates a technique called data anonymization or synthetic substitution — replacing real figures with structurally identical fictional ones before sending to an AI tool. The analyst gets the same quality of output (professional board language, logical structure, actionable recommendation) without transmitting actual company performance data to an external server. This is a practical workaround that works well for many use cases, though it requires discipline and isn't suitable for situations where the exact numbers are the whole point. It's one of several risk-mitigation strategies that effective AI users apply automatically, the same way experienced professionals learned to be careful about what they put in email.

How the Major Tools Actually Compare

Tool	Consumer Plan — Training Default	Enterprise/API — Training Default	Human Review Possible	Data Retention (Consumer)	Approx. Enterprise Cost
ChatGPT (OpenAI)	Yes — opt out in settings	No (API); No (Teams/Enterprise)	Yes (consumer)	30 days after deletion	$30/user/mo (Teams)
Claude (Anthropic)	Yes — opt out in settings	No (API); No (Claude for Work)	Yes (consumer)	Deleted on request	$25/user/mo (Teams)
Gemini (Google)	Yes — Google data practices apply	No (Workspace/API)	Yes (consumer)	Google account retention rules	Included in Workspace plans
Microsoft Copilot	Yes (personal); No (M365 Copilot)	No (M365 Copilot)	Limited (M365)	Varies by Microsoft policy	$30/user/mo (M365 Copilot)
Perplexity	Yes (free/Pro consumer)	API: limited retention	Possible	90 days	API pricing by token
Notion AI	Processed by OpenAI/Anthropic	Enterprise: DPA available	Governed by underlying model	Per Notion's retention policy	Included in Business plan

Data handling defaults across major AI tools as of mid-2024. Always verify against current provider terms — these policies update frequently.

The table above reveals a pattern that most professionals miss: the same underlying capability can carry very different data risks depending on the access path. Notion AI, for example, routes your prompts through OpenAI or Anthropic's models — meaning Notion's privacy policy and the underlying model provider's policy both apply. Microsoft Copilot in a personal context behaves like a consumer product; inside a properly licensed Microsoft 365 Copilot deployment, it operates under Microsoft's enterprise data protection commitments, including the EU Data Boundary for European customers. The brand name on the interface doesn't tell you the full story. You need to know the plan tier, the underlying infrastructure, and whether your organization has a data processing agreement (DPA) in place.

Data processing agreements are the legal mechanism that converts a consumer-grade tool into an enterprise-grade one from a compliance standpoint. A DPA is a contract between your organization and the AI provider that specifies what the provider can do with your data, confirms they won't use it for training, establishes security standards, and defines breach notification obligations. OpenAI, Anthropic, Google, and Microsoft all offer DPAs for their enterprise products — but you have to be on the right plan and your legal or procurement team has to have executed the agreement. If your team is using free or Plus tiers and no DPA exists, you're operating outside any formal data protection framework, which is an increasingly common audit finding as organizations get serious about AI governance.

Manager — Drafting an Internal AI Usage Guideline

Prompt

I need to draft a short, practical guideline for my team of 12 marketing professionals about what types of information they should and shouldn't paste into consumer AI tools like ChatGPT or Claude. The tone should be direct and helpful, not legalistic. Format it as a simple do/don't list with a one-sentence rationale for each item.

AI Response

**What to put into consumer AI tools — and what to keep out** ✅ DO: - Paste anonymized or synthetic data (swap real figures for placeholders) — you get the same output quality without exposing real numbers. - Use AI for structural and language tasks: rewriting, summarizing publicly available content, brainstorming campaign concepts — none of this requires sensitive input. - Draft external-facing copy like ad scripts or email sequences — these will be public anyway, so no confidentiality is at stake. 🚫 DON'T: - Paste client names, contract terms, or proposal details — this sends confidential third-party information to an external server your client never agreed to. - Upload real financial data, sales figures, or forecasts — even in a 'just for formatting' context, the data is transmitted and potentially retained. - Include employee names or performance details — this creates personal data processing obligations under GDPR and similar regulations. - Share anything marked confidential, restricted, or commercially sensitive — if you'd hesitate to post it on LinkedIn, don't paste it into a free AI tool.

What Changes at Work on Monday

Understanding the data journey doesn't mean using AI less — it means using it smarter. The most effective professionals develop a quick mental filter that runs before they paste anything into an AI interface: Is this information public or internal? Does it identify specific people, clients, or deals? Is it marked confidential or subject to regulatory requirements? If the answer to any of those questions creates hesitation, the response is either to anonymize the input (as the analyst example showed) or to use an enterprise-tier tool where a DPA is in place. This filter takes about three seconds once it's habitual, and it dramatically reduces compliance exposure without meaningfully slowing down work.

The second Monday change is checking which tier of tool you're actually using. Most professionals assume they're using a work-appropriate version of a tool when they're actually on a personal account or a consumer plan. Log into ChatGPT right now and look at the top-left corner — does it say 'ChatGPT' (consumer) or show a company workspace? Check Claude.ai — is there a team or organizational indicator? If you're using Microsoft Copilot, is it the free Bing-based version or the M365 Copilot integrated into your licensed Office environment? These are not trivial distinctions. They determine the entire legal and technical framework governing your data, and most employees have never been told to check.

For managers and team leads, the implication extends beyond personal practice. If your team uses AI tools — and statistically, they do, whether you've sanctioned it or not — you're responsible for ensuring they understand these distinctions. A single employee pasting client data into a free ChatGPT account can create contractual breaches with that client, regulatory exposure under GDPR or CCPA, and reputational risk if it ever surfaces in a dispute. The good news is that basic guidance — a short do/don't list like the one in the prompt example above — meaningfully reduces that risk and takes an afternoon to produce and distribute. You don't need a full AI policy to make a real difference; you need a practical starting point that your team will actually read.

Audit Your Team's AI Tool Usage

Goal: Produce a clear inventory of your team's AI tool usage with risk levels assigned, and a practical one-page guidance note your team can apply immediately.

1. Open a spreadsheet and create four columns: Tool Name, Plan/Tier, Who Uses It, Data Risk Level (Low/Medium/High). 2. List every AI tool your team currently uses — include ChatGPT, Claude, Gemini, Copilot, Perplexity, Notion AI, Grammarly, and any others you've seen in use. 3. For each tool, identify the specific plan or tier: free, Plus, Teams, Enterprise, API, or Microsoft 365 licensed. Check with team members if unsure. 4. Look up whether your organization has a data processing agreement with each provider — ask your legal, IT, or procurement team. 5. Assign a data risk level: Low = enterprise plan with DPA, Medium = paid consumer plan with opt-out configured, High = free consumer plan with default settings. 6. Identify the two highest-risk tools and write one sentence explaining the specific risk (e.g., 'Free ChatGPT accounts used by three analysts — no DPA, training opt-out not confirmed, client data likely shared'). 7. Draft a single-page guidance note for your team covering: which tools are approved for sensitive data, which are approved for general tasks only, and which should not be used for work data until further notice. 8. Share the guidance note with your team and schedule a 15-minute discussion to answer questions and collect feedback on tools you may have missed. 9. Set a calendar reminder for 60 days to repeat this audit — provider policies and available plans change regularly.

How Data Risk Looks Different by Role

Consultants: Your highest risk is client confidentiality. Pasting proposal language, org charts, financial models, or strategic plans into consumer AI tools can breach NDAs and client contracts — even if the output is excellent. Anonymize aggressively or use a firm-approved enterprise tool.
Marketing managers: Campaign briefs and creative copy are generally low-risk; customer data, CRM exports, or any file containing email addresses or behavioral data is high-risk and often regulated under GDPR or CCPA.
Financial analysts: Non-public financial information (earnings forecasts, M&A targets, budget actuals) is regulated under securities law in many jurisdictions. Consumer AI tools are categorically inappropriate for this data — full stop.
HR professionals: Employee names, performance ratings, compensation data, and health information are personal data under virtually every privacy regulation. Using consumer AI to rewrite performance reviews creates personal data processing obligations you almost certainly haven't met.
IT and operations: System architecture diagrams, network configurations, and access credentials are security-sensitive. Sharing these with any external service — AI or otherwise — should require security review.
Sales teams: Customer names, deal sizes, and pipeline data are often covered by NDAs with customers and by your own company's data classification policies. A deal summary pasted into ChatGPT is external disclosure.

When Your Data Trains the Model: The Opt-In You Didn't Read

Picture this: a consultant on your team pastes a client's restructuring proposal into ChatGPT to get a sharper executive summary. The output is excellent. She saves time, the client never knows, and nothing bad happens — this week. But buried in the terms she accepted six months ago is a clause that, depending on her account type and settings, may have just fed that client's strategic thinking into a training dataset used to improve future model versions. This isn't hypothetical paranoia. It's the default behavior of several major AI tools unless you actively change it. Most professionals don't know the difference between inference-only processing and training data collection — and that distinction determines your actual risk exposure.

Inference vs. Training: The Distinction That Actually Matters

Every time you send a prompt, the AI model processes it to generate a response. That processing is called inference — your input goes in, computation happens, output comes back. In a privacy-safe setup, inference is the whole story: your data is used once, transiently, then discarded. Training is different. Training means your prompt, or data derived from it, is stored and used to update the model's weights — permanently shaping how it responds to future users. OpenAI's free ChatGPT tier, by default, uses conversation data to improve its models. The API (used by developers building products) does not, by default, use data for training. This single distinction — same company, different product tier — produces completely different risk profiles for the same task.

The practical implication is that the AI tool your company pays for through an enterprise agreement behaves very differently from the free browser version your team discovered on their own. Microsoft Copilot for Microsoft 365, sold to enterprises, explicitly commits to not using customer data to train foundation models. The free Copilot at copilot.microsoft.com operates under consumer terms with different defaults. Google Workspace's Gemini for Business similarly separates enterprise data from Google's broader training pipelines. When a team member uses their personal Gmail account to access Gemini instead of the company Workspace account, they've crossed from one data regime into another — without any visible signal that anything changed. This is exactly where data leakage happens in practice.

The Account-Type Trap

The same AI product can have completely different data handling rules depending on whether you're logged in with a personal account, a free tier, or an enterprise license. Before your team uses any AI tool on real work, confirm the account type in use. A free ChatGPT account and a ChatGPT Team or Enterprise account are governed by different terms — and the gap between them is the difference between your data potentially training the next model and your data staying within a contractual boundary.

What 'Data Retention' Actually Looks Like Across Tools

Retention policies determine how long your prompts and outputs are stored on a provider's servers after your session ends. This matters for two reasons: regulatory compliance (GDPR's data minimization principle, for example, requires that personal data isn't kept longer than necessary) and breach exposure (data that isn't stored can't be stolen). Anthropic retains Claude.ai conversations for up to 90 days by default, though enterprise customers negotiate custom terms. OpenAI retains API call data for 30 days for abuse monitoring before deletion. Perplexity, which many analysts use for research tasks, stores search history and AI interactions and links them to your account if you're logged in. The point isn't that any of these companies are behaving badly — it's that each one has made a specific engineering and business decision about retention, and you should know what it is before sending sensitive content.

Tool	Default Training Use	Retention Period	Enterprise/API Exception	Key Control Available
ChatGPT (Free/Plus)	Yes — conversations used to improve models	Until deleted by user	ChatGPT Team/Enterprise: no training use	Toggle off in Settings > Data Controls
ChatGPT API	No — not used for training by default	30 days (abuse monitoring)	Zero retention option available	Set via API dashboard
Claude.ai (Free/Pro)	No — Anthropic does not train on Claude.ai conversations	90 days default	Enterprise: custom retention terms	Delete conversations manually
Gemini (Personal Google)	Yes — may be reviewed by humans to improve AI	Up to 18 months by default	Google Workspace Business/Enterprise: excluded from training	Gemini Apps Activity toggle in Google Account
Microsoft Copilot (Consumer)	Yes — consumer terms apply	Variable	Microsoft 365 Copilot (enterprise): no training use	Requires enterprise license
GitHub Copilot (Individual)	Code snippets may be used for training	Varies	GitHub Copilot Business: no training use	Settings toggle; Business plan recommended
Notion AI	No — Notion does not train on workspace content	Per Notion workspace retention	Enterprise adds DPA options	Review workspace data settings
Perplexity	May use interactions to improve product	Linked to account if logged in	No enterprise tier currently	Use without account for less linkage

Data handling defaults for major AI tools as of mid-2024. Enterprise terms frequently differ from consumer defaults — verify current policies directly with each vendor.

Seeing This in Action: Two Professional Scenarios

Abstract policy language only becomes useful when you can map it to something you actually do at work. The two prompt examples below show the same professional task handled in two different ways — one that creates real exposure and one that doesn't. Notice that the quality of the output doesn't change. The risk difference is entirely in what data gets sent and where it goes. This is the core skill: separating what the AI needs to know from what it doesn't, so you get the same useful output without transmitting sensitive details that have no business being in someone else's data center.

Risky Approach — Analyst Sending Raw Client Data (Free ChatGPT Tier)

Prompt

Here is our Q3 pipeline report for Meridian Financial Services. Their CFO is Sarah Chen. They're evaluating a $4.2M contract renewal. Our win probability is currently 47%. Key risk: competitor Axon Consulting submitted a lower bid last week. Write a one-page internal briefing for our VP of Sales summarizing the situation and recommending next steps.

AI Response

This prompt sends real client names, deal values, internal win probabilities, and competitive intelligence directly into a free-tier tool that may use conversations for training. The output quality will be excellent — but the CFO's name, the deal size, the competitor's name, and your internal assessment now exist in a third-party system under consumer data terms. If Meridian Financial Services has a data processing agreement with your company, this may already be a compliance violation.

Safer Approach — Same Task, Anonymized Inputs (Any Tool Tier)

Prompt

I'm preparing an internal briefing for a VP of Sales. Here's the situation: a large financial services client (contract value ~$4M) is up for renewal in Q3. Internal win probability is around 47%. Main risk is a competing bid from another consultancy that came in lower on price last week. The client relationship is strong but the economic buyer is new. Write a one-page briefing summarizing the situation and recommending next steps to strengthen our position before the decision.

AI Response

This prompt produces an equally useful briefing. The VP gets the same strategic context. But no client name, no contact name, no exact deal value, and no competitor name has left your organization. The AI has everything it needs to give you a sharp, practical output — the identifying details were never necessary for the task. This is data minimization in practice: send the shape of the problem, not the personally identifying specifics.

The Aggregation Problem: When Innocent Inputs Combine

One underappreciated risk isn't any single prompt — it's the accumulation of prompts over time within a single session or account. A marketing manager might ask ChatGPT to help draft a campaign brief (mentions the brand), then ask it to analyze competitor positioning (names the competitors), then ask it to help write a job description for a new hire on the team (reveals headcount plans), then ask for talking points for a board meeting (reveals strategic priorities). Each individual prompt seems innocuous. Together, they constitute a detailed picture of your company's strategic situation, competitive intelligence, hiring plans, and board-level concerns — all linked to your account, all potentially retained. This aggregation risk is why enterprise agreements typically include not just training exclusions but also strict data isolation guarantees: your data doesn't mix with other customers' data in the provider's systems.

Addressing aggregation risk requires a different mental model than one-prompt-at-a-time thinking. The practical discipline is to treat your AI conversation history the way you'd treat a shared whiteboard in a glass-walled conference room: visible, persistent, and not the place for anything you wouldn't want others to see. For high-sensitivity work, use new sessions rather than continuing long conversations that build up context. On ChatGPT, you can disable chat history for a session — the conversation won't be saved and won't be used for training. On Claude.ai, you can delete individual conversations. These aren't theoretical privacy controls; they're the specific actions that determine whether your Monday morning strategy session stays internal.

The Anonymization Reflex

Before sending any prompt that includes real names, company names, financial figures, or project codes, ask yourself: does the AI actually need this specific detail to help me? In most cases, the answer is no. Replace 'our client ACME Corp' with 'a mid-sized manufacturing client.' Replace '$2.3M contract' with 'a large contract.' Replace a colleague's name with 'a senior analyst on my team.' You get the same quality output, and you've just eliminated the privacy risk entirely. This takes about 15 extra seconds and becomes automatic within a week of practice.

How Your Role Changes What You Need to Protect

Data risk isn't uniform across job functions. A financial analyst and a marketing manager both use AI tools, but they're handling very different categories of sensitive information — and the regulatory and contractual stakes differ accordingly. GDPR and CCPA create specific obligations around personal data (names, contact details, behavioral data). SEC regulations create obligations around material non-public information. Attorney-client privilege creates obligations around legal strategy. Healthcare data triggers HIPAA. Knowing which regime applies to your work tells you how carefully you need to approach any given AI task — and whether you need legal or compliance sign-off before using AI tools at all for that category of work.

Managers and executives: Your highest-risk inputs are strategic plans, M&A discussions, board materials, and headcount decisions. These rarely need to go into AI tools verbatim — anonymize and abstract before prompting.
Marketing and brand teams: Customer data, campaign performance figures, and audience segmentation data are often personal data under GDPR/CCPA. Use aggregate or synthetic data for AI tasks; never paste CRM exports into free-tier tools.
Analysts and data teams: Model outputs, financial forecasts, and pricing data can constitute material non-public information in public companies. Confirm with compliance whether AI tool use for these tasks requires pre-approval.
Consultants and advisors: Client confidentiality agreements frequently prohibit sharing client information with third-party systems without explicit consent. Check your engagement terms before using any AI tool on client work — this isn't optional.
HR and people teams: Salary data, performance reviews, and candidate information are personal data with strong protections in most jurisdictions. These should never enter consumer-tier AI tools and require DPA coverage even in enterprise tools.
Legal and compliance teams: Attorney-client privilege is not preserved when information passes through a third-party AI system. For any legally sensitive work, use locally deployed models or get explicit legal guidance on which tools are permissible.
Sales teams: Deal-specific data, pipeline figures, and competitive intelligence are often covered by NDAs with clients. Use the anonymization technique from the prompt examples above — it solves 90% of the risk without slowing you down.

Third-Party Integrations: The Hidden Data Path

The data flow gets more complicated the moment you connect AI tools to other systems. Notion AI has access to your entire Notion workspace — every page, database, and document you've stored there. GitHub Copilot has access to your codebase, including any credentials, API keys, or proprietary algorithms that may be embedded in it. When you install a ChatGPT plugin or a GPT Action that connects to your CRM, calendar, or file storage, you're creating a data pathway from those systems into the AI provider's infrastructure. Each integration multiplies the surface area of data that could be processed, retained, or — in a breach scenario — exposed. Enterprise procurement teams are increasingly requiring AI vendors to complete security questionnaires specifically about third-party data flows, and for good reason.

The discipline here is straightforward: treat AI integrations with the same scrutiny you'd apply to any new SaaS tool that requests access to your company data. Before enabling Copilot in Microsoft 365, understand what SharePoint documents it can access and whether your sensitivity labels are configured to restrict it. Before connecting a Zapier workflow that routes customer emails through an AI summarization step, confirm which AI provider Zapier is using for that step and under what data terms. The integrations that feel like convenient automation are also the integrations most likely to move data through systems you haven't explicitly evaluated. Visibility into the full data path — not just the AI tool itself — is the standard you're aiming for.

Audit Your Team's AI Data Exposure

Goal: Produce a clear, current picture of your team's AI tool usage and data exposure — and implement at least one concrete change to reduce risk before the end of the week.

1. List every AI tool your team currently uses — include browser extensions, embedded features (like Notion AI or Gmail's Help Me Write), and any tools team members use with personal accounts for work tasks. 2. For each tool, identify the account type in use: personal/free, individual paid, or enterprise/business license. Note any tools where different team members use different account types. 3. Open the privacy or data settings for your two most-used tools. Screenshot or note the current state of any training opt-out, chat history, or data retention toggles. 4. For ChatGPT specifically, go to Settings > Data Controls and confirm whether 'Improve the model for everyone' is enabled or disabled. If enabled on a non-enterprise account used for work, flag this for your team. 5. Identify one category of work your team regularly does with AI tools that involves real names, client names, or financial figures. Write a one-sentence anonymization rule for that category (e.g., 'All client names replaced with industry descriptor before prompting'). 6. Check whether any AI tools are integrated with shared systems — your CRM, file storage, email, or project management tools. List what data each integration can access. 7. Draft a one-paragraph 'AI data handling note' for your team — three sentences maximum — that tells them which account type to use, what not to paste in directly, and where to go with questions. Share it in your team's main communication channel this week. 8. Identify one tool on your list where you cannot confirm the data handling terms. Flag it for follow-up with your IT or legal team before further use on sensitive work.

Picture this: it's Wednesday afternoon and your CFO asks you to draft a board memo summarising last quarter's revenue shortfall. You open ChatGPT, paste in the raw financials, and start typing. Within 90 seconds you have a polished first draft. What you may not have considered is that those revenue figures, your company name, and the context you provided are now sitting inside a prompt that a commercial AI provider has received, logged, and — depending on your account type — may use to improve its models. This is not a hypothetical risk. It is the default behaviour of most consumer-tier AI tools, and it happens silently every time you hit send.

The Data Lifecycle Inside an AI Tool

When you submit a prompt to any major AI tool, your text travels through at least three distinct stages: transmission, processing, and retention. During transmission, your prompt moves from your device to the provider's servers — encrypted in transit, but fully readable once it arrives. During processing, the model generates a response using your input as immediate context. During retention, the provider stores a log of the exchange for purposes that vary by product tier and jurisdiction. OpenAI's default settings for free and Plus ChatGPT users allow conversation data to be used for model training unless you actively opt out in Settings → Data Controls. Anthropic's Claude and Google's Gemini follow similar patterns, with enterprise contracts offering stronger exclusions. The practical implication is simple: treat every free-tier prompt like a postcard, not a sealed envelope.

Retention windows are a detail most professionals overlook. OpenAI retains conversation data for up to 30 days for safety monitoring even when training opt-out is enabled. Microsoft's Copilot for Microsoft 365 — the enterprise product integrated into Teams, Word, and Outlook — stores prompt and response data in your organisation's Microsoft 365 compliance boundary, which is a fundamentally different posture. Your IT or legal team has access to those logs through the compliance portal. That means a sensitive HR conversation you had with Copilot in a Word document could, in principle, surface during an eDiscovery request. Knowing which tier you are on is not optional knowledge — it is baseline professional hygiene.

Check Your Tier Before You Type

In ChatGPT, go to Settings → Data Controls. If 'Improve the model for everyone' is toggled on, your prompts feed training data. Toggle it off, or upgrade to ChatGPT Team or Enterprise where training opt-out is automatic. For Claude, check whether you are on Claude.ai (consumer) or using the API via a business agreement — the data handling is materially different.

Matching Tool Choice to Data Sensitivity

Not every AI tool is appropriate for every task. The right mental model is a sensitivity ladder: low-sensitivity work (brainstorming, public research, editing non-confidential copy) is fine on any tool. Medium-sensitivity work (internal strategy documents, anonymised client data, HR policy drafts) belongs on enterprise-tier tools where your organisation controls the data boundary. High-sensitivity work (personal data covered by GDPR, M&A financials, legally privileged communications) should either use a privately deployed model or avoid AI assistance entirely until your legal team has reviewed the setup. The table below maps real tools to these tiers so you can make that call quickly.

Tool	Default Tier	Training Opt-Out Available?	Enterprise Data Boundary?	Suitable Sensitivity Level
ChatGPT Free/Plus	Consumer	Yes (manual)	No	Low
ChatGPT Team	SMB	Yes (automatic)	Partial	Low–Medium
ChatGPT Enterprise	Enterprise	Yes (automatic)	Yes	Medium–High
Claude.ai (free/Pro)	Consumer	Limited	No	Low
Claude for Enterprise	Enterprise	Yes	Yes	Medium–High
Microsoft Copilot M365	Enterprise	Yes (admin-controlled)	Yes (M365 boundary)	Medium–High
Google Gemini (free)	Consumer	Yes (manual)	No	Low
Google Gemini for Workspace	Enterprise	Yes	Yes	Medium–High
Perplexity (free)	Consumer	No	No	Low
GitHub Copilot Business	Enterprise	Yes	Yes	Medium

Data handling tiers for major AI tools as of mid-2025. Enterprise contracts vary — always verify with your vendor.

Rewriting Prompts to Remove Sensitive Data

The most practical skill you can build right now is prompt sanitisation — the habit of stripping identifying information before you submit. This does not mean avoiding AI for real work. It means substituting generic placeholders for specifics, then re-inserting the real details into the AI's output yourself. A prompt containing 'Acme Corp's Q3 EBITDA of $4.2M' becomes 'a mid-size manufacturing client with Q3 EBITDA of $4.2M.' You get the same analytical value; the provider receives no client-identifiable data. This technique takes about 30 seconds and eliminates the most common data-exposure vector professionals encounter.

Before Sanitisation — Risky

Prompt

Draft an executive summary of Meridian Logistics' Q3 results. Revenue was $18.7M, down 12% YoY. EBITDA was $2.1M. The main driver was a contract loss with their largest client, RetailCo.

AI Response

This prompt exposes the client's name, precise financials, and a sensitive commercial event to a consumer-tier model. If this data is used in training, it becomes part of a model that millions of users query. A competitor could theoretically surface it through a well-crafted prompt.

After Sanitisation — Safe

Prompt

Draft an executive summary for a mid-size logistics company. Q3 revenue was $18.7M, down 12% YoY. EBITDA was $2.1M. The primary driver was the loss of a major client contract. Tone should be candid but constructive for a board audience.

AI Response

Q3 performance reflected the impact of a significant client departure, with revenue declining 12% year-over-year to $18.7M and EBITDA compressing to $2.1M. While this result falls short of plan, the underlying business retained its remaining client base and operating cost structure. Management's immediate priority is pipeline acceleration and contract diversification to restore revenue momentum in Q4.

What Changes at Work on Monday

Awareness without a new habit is just anxiety. The concrete change to make this week is a 60-second pre-prompt checklist. Before submitting any prompt, ask three questions: Does this contain a real person's name or contact details? Does this contain financial data tied to a named organisation? Does this contain information I would not share publicly? If any answer is yes, either sanitise the prompt or switch to an enterprise-tier tool where your organisation controls the data. Pin this checklist to your browser bookmarks bar or add it as a note in Notion so it becomes a reflex, not a deliberate pause.

Teams and managers have an additional responsibility: the people they lead are using AI tools right now, often without guidance. A single team conversation — 15 minutes in a standup or a one-page Confluence note — can shift group behaviour significantly. Share the sensitivity ladder from this lesson. Clarify which tools your organisation has enterprise agreements with. Establish a shared rule: client names and personal data stay off consumer-tier tools, full stop. You do not need a formal policy to start; a clear team norm reduces risk immediately while formal processes catch up.

The professionals who get the most value from AI over the next two years will not be the ones who use it most recklessly or the ones who avoid it out of fear. They will be the ones who use it precisely — knowing which tool fits which task, protecting sensitive data as a reflex, and building team habits that scale. Every interaction you have with an AI tool is a data transaction. Understanding the terms of that transaction is what separates informed use from accidental exposure.

Build Your Personal AI Data Safety Reference Card

Goal: Produce a personalised, living reference card that maps your actual AI tool usage to appropriate data sensitivity levels, with a pre-prompt checklist you will use every day.

1. Open a blank document in Word, Notion, or Google Docs and title it 'AI Tool Data Safety — My Reference Card'. 2. List every AI tool you currently use at work (e.g., ChatGPT, Copilot, Gemini, Perplexity, Notion AI, GitHub Copilot). 3. For each tool, note whether you are on a consumer or enterprise tier — check your account settings or ask your IT team if unsure. 4. For each tool, record whether training opt-out is enabled. Toggle it on in any consumer-tier tool where it is currently off. 5. Write three columns next to each tool: 'Safe to use for' / 'Use with caution for' / 'Do not use for' — and populate them based on the sensitivity ladder from this lesson. 6. Draft your personal 3-question pre-prompt checklist (real names? named financials? non-public information?) and add it to the card. 7. Identify one colleague or direct report who regularly uses AI tools and schedule a 10-minute conversation to share your sensitivity ladder. 8. Save the card somewhere you will actually see it — pinned in Notion, bookmarked, or printed and taped to your monitor. 9. Set a calendar reminder for 30 days from now to review and update the card as tools and your usage evolve.

Every prompt is a data transaction — consumer-tier tools log, retain, and may train on your input by default.
The three-stage lifecycle (transmission, processing, retention) applies to every AI tool; retention windows and training use vary by tier.
Enterprise-tier products (ChatGPT Enterprise, Copilot M365, Gemini for Workspace, Claude for Enterprise) keep data inside your organisation's compliance boundary.
Prompt sanitisation — replacing identifying details with generic placeholders — takes 30 seconds and eliminates the most common exposure vector.
The sensitivity ladder (low / medium / high) gives you a fast decision rule for matching task to tool.
Team norms reduce risk immediately; you do not need a formal policy to start protecting your colleagues' work.
Your pre-prompt checklist (real names? named financials? non-public information?) is the single most reusable habit from this lesson.

Knowledge Check

A consultant pastes a client's full name and revenue figures into ChatGPT Plus to draft a memo. Training opt-out is NOT enabled. What is the most accurate description of the risk?

Which of the following tasks is most appropriate for a free-tier consumer AI tool like Claude.ai or ChatGPT Free?

An analyst wants to use AI to analyse Q3 financials for a named client on a consumer-tier tool. What is the best approach?

Your organisation uses Microsoft Copilot for Microsoft 365. A colleague asks whether it is safe to draft an HR performance review using Copilot in Word. What is the most accurate answer?

A manager wants to establish a quick team norm around AI data safety without waiting for a formal company policy. Which action has the highest immediate impact?

This lesson requires Pro

Upgrade your plan to unlock this lesson and all other Pro content on the platform.

Upgrade to Pro

You're currently on the Free plan.