What not to share with public AI tools
~21 min readIt's Tuesday afternoon. Your CFO needs a competitive analysis by Thursday. You open ChatGPT, paste in last quarter's revenue figures, a list of your top five clients by spend, and a draft of the pricing strategy your team spent three months building. You ask for a summary and some strategic framing. The output is excellent. You hit send on the deck. What you didn't think about: that data now exists outside your organization's walls, processed by a third-party system under terms of service most professionals have never read. This isn't a hypothetical. It's happening in offices everywhere, every day, often by smart people who simply haven't been told where the line is.
Why Public AI Tools Create Real Exposure
Public AI tools — ChatGPT (OpenAI), Claude (Anthropic), Gemini (Google), Perplexity, Copilot (Microsoft's consumer version) — are designed to be used by anyone. That's their strength and your risk. When you type into the free or standard paid tiers of these tools, your input is processed on external servers. Depending on the platform's current settings and your account type, that data may be used to improve future models, stored in logs accessible to company employees, or retained for varying periods. OpenAI's default data retention for ChatGPT Free users has historically been up to 30 days of conversation history, with the option to turn off training — but the default is opt-in to training unless you explicitly change it. Most users never do.
The core issue isn't that these companies are malicious. It's that public AI tools operate under consumer-grade data agreements, not enterprise-grade ones. When your legal team signs a contract with Salesforce or your IT department provisions Microsoft 365, there are data processing agreements, security standards, and liability clauses in place. When a marketing manager signs up for ChatGPT Plus at $20/month on a personal card to get work done faster, none of that infrastructure exists. The data governance frameworks your company built over years don't automatically extend to tools people adopt on their own. This gap — between sanctioned enterprise software and individual AI tool use — is where most accidental exposure happens.
The risk compounds because AI tools are genuinely useful for exactly the kinds of tasks that involve sensitive information. Summarizing meeting notes, drafting proposals, analyzing financial models, preparing client presentations — these are high-value AI use cases, and they're also the tasks most likely to involve data your company would not want shared externally. A consultant drafting engagement proposals, an analyst building a market model, a manager writing a performance review: all of these are perfect AI tasks, and all of them touch information that needs protection. The tool's usefulness is precisely what makes the risk easy to overlook.
Samsung's $1M Lesson
The Six Categories You Must Protect
Not all sensitive information carries the same risk profile, but six categories consistently appear in the data incidents professionals create with AI tools. Personal data covered by regulations like GDPR or HIPAA tops the list — names, email addresses, health records, financial account details, or any combination of identifiers that could be traced back to an individual. If you're in healthcare, HR, financial services, or any consumer-facing role, this category is your highest-stakes exposure. Pasting a spreadsheet of customer names and purchase histories into ChatGPT to ask for segmentation ideas isn't just a privacy risk — it may be a regulatory violation with fines attached. GDPR fines can reach €20 million or 4% of global annual turnover, whichever is higher.
- Personal data: Names, emails, phone numbers, addresses, health data, financial account details, or any data that identifies a specific individual
- Proprietary business information: Revenue figures, margin data, pricing strategies, unreleased product roadmaps, M&A targets, or internal forecasts
- Client and customer data: Client names, contract terms, spend data, business challenges shared in confidence, or anything covered by an NDA
- Credentials and access: API keys, passwords, login tokens, or internal system URLs that could enable unauthorized access
- Legal and compliance materials: Pending litigation details, regulatory filings in progress, legal opinions, or settlement terms
- Employee information: Performance reviews, compensation data, disciplinary records, or health-related accommodations
These six categories aren't equally obvious in practice. Credentials are easy — nobody thinks pasting a password into ChatGPT is fine. But proprietary business information is where professionals most frequently make mistakes, because the data feels abstract. A revenue forecast isn't a person's name. A pricing model doesn't have a face attached to it. But that data represents competitive advantage your organization has spent real money to develop, and sharing it with a public AI system potentially puts it in a training corpus that could, in some form, surface in responses to competitors. The abstraction makes it feel lower-stakes. It isn't.
How Different AI Tools Handle Your Data
Not all AI tools treat your data the same way, and the differences matter enormously for how you use them at work. The tier you're on — free, paid consumer, or enterprise — is often the most important variable. ChatGPT Enterprise and Claude for Enterprise both offer contractual guarantees that your data won't be used for training, plus stricter retention controls. The free tier of the same product may offer no such protection. Perplexity's Pro plan stores search history but allows deletion. GitHub Copilot for Business explicitly excludes your code from training data. Understanding these distinctions lets you make deliberate choices rather than hoping for the best.
| Tool | Free Tier Data Use | Paid Consumer Tier | Enterprise/Business Tier | Key Control Available |
|---|---|---|---|---|
| ChatGPT (OpenAI) | May use inputs for training by default | ChatGPT Plus: training opt-out available in settings | ChatGPT Enterprise: no training on your data, contractual guarantee | Toggle off 'Improve the model' in Data Controls |
| Claude (Anthropic) | Free tier: inputs may be reviewed by staff for safety | Claude Pro: reduced retention, but not zero | Claude for Enterprise: contractual data protection, SSO | Enterprise contract required for full protection |
| Gemini (Google) | Inputs reviewed by human reviewers by default | Gemini Advanced: still subject to Google's standard terms | Google Workspace with Gemini: admin controls, no training use | Workspace admin must enable data protection settings |
| GitHub Copilot | N/A — no free enterprise tier | Copilot Individual: code snippets may be used for training | Copilot Business/Enterprise: code excluded from training | Business/Enterprise subscription required |
| Perplexity | Searches logged and stored | Pro: history stored, manual deletion available | Enterprise Pro: data isolation, SOC 2 compliance | Manual history deletion in account settings |
| Notion AI | Subject to Notion's standard privacy policy | Plus/Business: standard Notion terms apply | Enterprise: data processed via OpenAI with DPA in place | Enterprise contract with DPA required |
The Tier Test
What You Can Do Differently Starting Monday
The practical shift isn't to stop using AI tools — that would mean surrendering a genuine productivity advantage. The shift is to develop a consistent habit of substitution before submission. Before pasting anything into a public AI tool, scan it for the six categories above and replace identifiable or sensitive details with placeholders. Real client name becomes 'Client A.' Actual revenue figures become 'Revenue X.' Specific employee names in a performance context become 'the employee.' This takes 60 to 90 seconds and preserves almost all of the AI's usefulness while eliminating the exposure. The model doesn't need your client's actual name to help you write a proposal. It needs the structure of what you're trying to say.
Prompt
We're preparing a proposal for Meridian Healthcare Group, who spent $2.3M with us last year. Their main concern is reducing their claims processing time from 14 days to under 5. Our proposed solution costs $180K annually. Write an executive summary for the proposal.
AI Response
This version pastes a real client name, exact spend figures, and proprietary pricing into a public AI tool. If you're on ChatGPT Free or standard Claude, this data is now outside your organization's control. The output might be great — but the input created real exposure for your client relationship and your pricing strategy.
Prompt
We're preparing a proposal for a large healthcare client (Client A) who is a significant existing customer. Their main challenge is reducing a core processing cycle from 14 days to under 5. Our proposed solution is an annual software subscription. Write an executive summary that emphasizes operational efficiency gains and ROI. Tone: confident, concise, boardroom-ready.
AI Response
Executive Summary Client A faces a critical operational bottleneck: a 14-day processing cycle that industry benchmarks and internal targets demand be reduced to under 5 days. Left unaddressed, this gap creates downstream costs in staffing, compliance exposure, and customer satisfaction. Our proposed solution delivers end-to-end process automation purpose-built for healthcare operations. Clients in comparable deployments have achieved cycle time reductions of 60–75% within the first 90 days of implementation. The investment is structured as an annual subscription, providing Client A with predictable cost management, continuous platform updates, and dedicated implementation support — without capital expenditure. We recommend a phased rollout beginning with the highest-volume processing queues, enabling measurable ROI before full deployment. We are confident this engagement will deliver both the operational targets and the long-term scalability Client A requires.
Notice what the sanitized prompt preserved: the business context, the challenge, the tone requirement, the structural need. The AI produced an output that's immediately usable and professionally strong. What changed is that no real client name, no actual spend figure, and no specific pricing appeared in the submission. When you add the real details back in during your editing pass — which takes under two minutes — you have a finished document that was largely written by AI without any sensitive data ever leaving your organization's control. This is the core technique: AI does the structural and linguistic heavy lifting, you supply the sensitive specifics only in the final edit.
Building this habit also changes how you think about AI tool selection for different tasks. Once you're clear on what can and can't go into a public tool, you naturally start routing tasks accordingly. Non-sensitive drafting, brainstorming, research synthesis, template creation, and communication polishing can all go through free or consumer-tier tools without concern. Tasks involving the six protected categories either get sanitized first, or they get routed to enterprise-tier tools your organization has provisioned — or they stay in internal systems entirely. This tiered approach isn't restrictive. It's what lets you use AI aggressively for 80% of your work while protecting the 20% that actually carries risk.
How Exposure Risk Varies by Role
- Managers and team leads: Your highest risk is employee data — performance notes, compensation discussions, and disciplinary matters. Never paste employee details into public AI tools, even with first names removed if other context makes them identifiable.
- Marketing professionals: Client names, campaign budgets, and unreleased product messaging are your exposure points. Sanitize client identifiers and use placeholder figures when asking AI to analyze campaign performance or draft briefs.
- Financial analysts: Revenue data, margin figures, forecasts, and M&A-related analysis are all in the proprietary business category. Work with anonymized or rounded figures in public tools; use enterprise tools or local models for anything exact.
- Consultants: You hold client data under implicit or explicit NDA. Every client name, engagement detail, or proprietary process you've observed is off-limits for public AI tools. Your reputation and your firm's liability depend on this boundary.
- HR professionals: Compensation data, benefits details, accommodation records, and anything touching medical information falls under both company policy and legal protection in most jurisdictions. This category warrants the strictest discipline of any role.
- Legal and compliance teams: Pending litigation details, regulatory correspondence, and legal strategy are almost never appropriate for public AI tools. Even enterprise tools should be evaluated carefully for these use cases.
Goal: Identify your actual current exposure from past AI tool use, practice the substitution technique on real prompts from your own work, and establish a personal audit habit that prevents future accidental data sharing.
1. Open your chat history in whichever AI tool you use most frequently — ChatGPT, Claude, Gemini, or Perplexity. If you've disabled history, reconstruct from memory the last five work-related prompts you submitted. 2. For each prompt, identify whether you included any of the six protected categories: personal data, proprietary business information, client/customer data, credentials, legal/compliance materials, or employee information. 3. Create a simple log — a notes document or spreadsheet — with three columns: Prompt Topic, Category Present (Yes/No), and Specific Data Included. 4. For each prompt where you flagged a category, write a one-sentence description of what the actual risk would be if that data appeared in a training corpus or was reviewed by a vendor employee. 5. For each risky prompt, rewrite it using the substitution technique: replace all sensitive specifics with placeholders (Client A, Revenue X, Employee 1) while preserving the task structure. 6. Check your current account settings on the tool you use most. Navigate to privacy or data controls and confirm whether you have training opt-out enabled. Screenshot or note the current setting. 7. Based on your audit, identify which of your regular AI use cases need to move to an enterprise-tier tool, which can continue with sanitization, and which are already low-risk. Write down one sentence for each category. 8. Share your findings with one colleague — a direct report, peer, or manager — and walk them through the substitution technique using one of your rewritten prompts as the example. 9. Set a recurring 2-minute personal check: before submitting any work-related AI prompt this week, pause and ask 'Does this contain any of the six categories?' until the habit becomes automatic.
Key Takeaways from This Section
- Public AI tools — including paid consumer tiers of ChatGPT, Claude, and Gemini — operate under consumer-grade data terms, not enterprise agreements. The tier you're on changes your risk profile significantly.
- Six categories of information require consistent protection: personal data, proprietary business information, client data, credentials, legal/compliance materials, and employee information.
- The substitution technique — replacing sensitive specifics with placeholders before submitting — preserves nearly all of an AI tool's usefulness while eliminating the primary exposure vector.
- Enterprise tiers of tools like ChatGPT Enterprise, Claude for Enterprise, GitHub Copilot Business, and Google Workspace with Gemini offer contractual data protection that consumer tiers do not.
- Your role determines your highest-risk category: managers face employee data exposure, consultants face client NDA risk, analysts face proprietary data risk, and HR professionals face the strictest legal constraints.
- The Samsung incident — three separate data leaks via ChatGPT in under a month — demonstrates that this risk is not theoretical. It happens to technically sophisticated professionals in well-resourced organizations.
It's Tuesday afternoon. Your CFO asks you to prepare a briefing on why Q3 margins dropped. You have the data — a spreadsheet with product-level costs, supplier names, and actual vs. budgeted figures. The fastest move seems obvious: paste it into ChatGPT, ask it to identify the key drivers, and have a draft ready in 20 minutes. Most professionals in that moment don't pause. The deadline pressure is real, the tool is right there, and the output would genuinely be useful. But that spreadsheet contains supplier contract terms, internal cost structures, and financial figures that haven't been disclosed publicly. Sending it to a public AI model means it leaves your network entirely, processed on infrastructure you don't control, under terms of service that — unless your company has an enterprise agreement — do not guarantee confidentiality or data deletion.
The Data Categories That Carry the Highest Risk
Part 1 established the core principle: public AI tools are not confidential environments. Now let's get specific about which data categories create real exposure when shared. Financial data is the most commonly mishandled. This includes budget figures, revenue projections, margin analyses, cost breakdowns, and anything tied to earnings that hasn't been publicly disclosed. Under securities regulations in most jurisdictions, sharing material non-public financial information — even accidentally, even with an AI tool — can create legal liability. The AI doesn't trade on it, but your organisation's data governance and legal teams care deeply about where that information travels. If the model is later queried by someone else and surfaces related patterns, the chain of exposure becomes very hard to trace.
Personnel data is the second high-risk category, and it's the one most professionals underestimate. When you ask an AI to help draft a performance improvement plan, write a termination letter, or summarise feedback from a 360 review, you're feeding in information about a real, identifiable person — their name, their role, their manager's assessment of their work, possibly their compensation. Under GDPR in Europe, CCPA in California, and equivalent laws across dozens of other jurisdictions, employee data has explicit protections. Processing it through a third-party AI tool without a data processing agreement almost certainly violates those frameworks. The fact that you're using the tool to write a document, rather than to store data, doesn't change the legal exposure. The data was processed externally. That's what counts.
Client and customer data sits in the same risk tier. This includes client names paired with project details, customer lists with contact information, deal values tied to named accounts, and any personally identifiable information collected in the course of business. Most professional services firms, consultancies, and agencies have client confidentiality clauses in their contracts. Sharing a client's strategic situation — even in anonymised form — can still breach confidentiality if the context is specific enough to identify them. A prompt like 'my client is a mid-sized European logistics firm facing X, Y, Z challenges' may feel abstract, but if that description is distinctive, you've just described a real organisation to a model whose data handling you cannot audit.
The Aggregation Problem
What 'Anonymised' Actually Means in Practice
Many professionals believe that removing names and company identifiers makes data safe to share with public AI tools. This belief is partially correct but frequently over-applied. True anonymisation requires removing or generalising all attributes that could re-identify a subject — not just the obvious ones like name and email. A dataset showing monthly revenue by product line, geographic region, and customer segment for a specific industry vertical may contain no names whatsoever and still be re-identifiable by a competitor or counterparty who knows the market. Research on data re-identification consistently finds that 87% of Americans can be uniquely identified using just three data points: zip code, gender, and date of birth. The threshold for 'safe' anonymisation is much higher than most professionals apply in practice.
Prompt
Here's the situation: our client Meridian Logistics, a €240M revenue company based in Rotterdam, is losing margin on their last-mile delivery contracts. Their three biggest accounts — Kaufland, Rewe, and a third unnamed retailer — are pushing for 8% price reductions. Help me structure a negotiation strategy.
AI Response
This prompt exposes the client's name, revenue, headquarters, named customer relationships, and the specific commercial pressure they're under. If this prompt is stored, used in training, or accessed by anyone outside your session, you've disclosed confidential client information to an uncontrolled third party. This is a real breach risk regardless of how helpful the output would be.
Prompt
A logistics company with ~€200-300M revenue is facing pressure from its top retail accounts to cut last-mile delivery prices by 8%. The company currently operates on thin margins in this segment. Help me structure a negotiation strategy that protects margin while maintaining the client relationships.
AI Response
This prompt produces an equally useful strategic framework. The revenue range is broad enough to describe dozens of firms. No client name, no named accounts, no headquarters. The AI gets the context it needs to give relevant advice, and nothing in the prompt could identify the actual client even if the prompt were read by a competitor. This is the professional standard for working with AI on client matters.
Choosing the Right Tool for the Sensitivity Level
Not all AI tools carry the same risk profile. The distinction that matters most for professionals isn't capability — it's data handling. Enterprise versions of the major models offer contractual data protections that the free consumer tiers do not. If your organisation has negotiated an enterprise agreement, the legal and technical situation changes substantially. The table below maps the most widely used tools against their data handling defaults, because the right choice depends on what you're doing and what your organisation has agreed to.
| Tool | Consumer/Free Tier | Enterprise/API Tier | Data Used for Training? | Key Consideration |
|---|---|---|---|---|
| ChatGPT (OpenAI) | Prompts may be reviewed; opt-out available | ChatGPT Enterprise: data not used for training | Yes by default (free); No (Enterprise) | Enterprise requires company-level subscription at ~$30/user/month |
| Claude (Anthropic) | Prompts not used for training by default | Claude for Enterprise: BAA available, zero retention | No (all tiers) | Stronger privacy defaults than most competitors at consumer tier |
| Gemini (Google) | Prompts reviewed by humans; tied to Google account | Gemini for Workspace: data stays within org | Yes (consumer); No (Workspace) | Consumer version has significant human review exposure |
| Microsoft Copilot | Prompts may be used for model improvement | M365 Copilot: governed by Microsoft data processing terms | Yes (consumer); No (M365) | M365 Copilot is the only version suitable for corporate use |
| Perplexity | Searches logged; no enterprise tier widely available | Limited enterprise controls | Yes | Not suitable for sensitive professional queries |
| Notion AI | Operates within your Notion workspace | Enterprise: SOC 2, data processing agreements available | No (workspace-contained) | Data stays in your environment; lower risk for internal docs |
The table reveals a pattern that surprises most professionals: the tool you use matters less than the tier you're on. Claude at the consumer level offers stronger default privacy protections than Gemini's free tier, which exposes prompts to human review and ties activity to your Google account. Microsoft's M365 Copilot — the version embedded in Word, Excel, and Teams — operates under enterprise data processing terms that make it suitable for internal business data, while the free Copilot at copilot.microsoft.com does not. Perplexity, widely used for research, logs searches and has limited enterprise controls, making it inappropriate for queries involving client or competitive intelligence. The practical takeaway: check which tier your company has provisioned before assuming any tool is safe for sensitive work.
Quick Check Before You Paste
How This Plays Out Differently Across Roles
The specific data risks you face depend heavily on your function. The categories above apply universally, but the forms they take — and the mistakes most likely to happen — vary by role. Understanding your own exposure profile helps you build the right habits rather than a generic checklist that doesn't fit your actual workflow.
- Managers and HR professionals: Highest risk around personnel data. PIP documents, compensation discussions, org design changes, and succession planning details are all high-sensitivity. Never include real employee names or identifiable role descriptions in public AI prompts.
- Financial analysts: Material non-public financial data is the core risk. Budget models, deal valuations, and earnings forecasts should not be pasted into consumer AI tools. Use enterprise tools or anonymise figures to ranges rather than exact numbers.
- Marketing professionals: Customer segmentation data, CRM exports, and campaign performance tied to named accounts are the exposure points. Aggregate-level data (e.g., '18-35 urban professionals') is fine; individual-level or account-level data is not.
- Consultants and advisors: Client confidentiality is the primary obligation. Every prompt involving a client situation should pass the 're-identification test' — could this describe only one real organisation? If yes, anonymise further before proceeding.
- Product managers: Roadmap details, unreleased feature specifications, and pricing strategy are competitive intelligence. Sharing these with public AI tools creates IP exposure, even if no personal data is involved.
- Legal and compliance professionals: Privileged communications and work product are the obvious risks, but contract terms and regulatory filings in progress carry exposure too. Enterprise tools with zero-retention agreements are the minimum standard for this function.
- Sales professionals: Deal terms, pipeline values tied to named accounts, and competitor intelligence gathered from clients are all high-sensitivity. Anonymise account names and round deal values before using AI for deal strategy or email drafting.
Building a Sanitisation Habit That Actually Sticks
The challenge with data safety isn't knowledge — it's workflow. Professionals who understand the risks still paste sensitive data into consumer tools because the friction of sanitising feels high in the moment. The solution is to reduce that friction to near zero by building a simple, repeatable process rather than relying on case-by-case judgment. The most effective approach is a 'sanitise first, prompt second' habit: before opening any AI tool, spend 60 seconds reviewing what you're about to share. Replace specific names with role descriptors, replace exact figures with ranges, and replace identifiable project names with generic categories. This single habit prevents the vast majority of accidental data exposure without meaningfully slowing your work.
Teams that handle sensitive data regularly benefit from a shared prompt template library — a set of pre-anonymised prompt structures for the most common use cases. A consulting team might have a standard template for 'client situation analysis' that already has the right level of abstraction built in. A finance team might have templates for 'budget variance analysis' that prompt users to enter figures as percentages rather than absolutes. These aren't bureaucratic controls — they're time-savers. When the structure already exists, the cognitive load of deciding what to share drops dramatically, and compliance becomes the path of least resistance rather than a conscious effort.
Goal: Create a personalised, role-specific reference document that reduces the cognitive load of safe AI use to a quick checklist, making compliant behaviour the default rather than the exception.
1. Open a blank document and title it 'AI Prompt Safety Protocol — [Your Role]'. 2. List the top five types of data you work with regularly (e.g., client financials, employee performance data, pipeline figures, product roadmaps). 3. For each data type, write one sentence describing the specific risk if shared with a public AI tool — be concrete about what regulation, contract clause, or competitive exposure applies. 4. For each data type, write a sanitisation rule: the specific substitution you'll make before including that data in a prompt (e.g., 'Replace client name with industry descriptor + size range'). 5. Identify which AI tool your organisation has provisioned at enterprise tier, and note which data types are acceptable to use with it. 6. Write one example prompt for your most common AI use case, first as you'd naturally write it (with real data), then as a sanitised version. 7. Compare the two versions and confirm the sanitised version still gives the AI enough context to produce useful output. 8. Save the document somewhere you'll actually find it — your desktop, a pinned Notion page, or a shared team folder. 9. Share the sanitised prompt template with one colleague this week and ask them to do the same exercise for their role.
When the Tool Itself Asks for More Context
Modern AI tools are increasingly good at asking clarifying questions. Claude in particular will often request additional context to give a better answer. This creates a subtle risk: the model's follow-up questions can lead you to share more than you originally intended. You start with a sanitised prompt, the model asks 'what industry is this client in?' or 'what's the approximate deal size?', and you answer without thinking — because the question feels like a natural conversation. Treat every model follow-up question with the same scrutiny as your initial prompt. You are not obligated to answer. 'I'd prefer to keep that detail generic' is a completely valid response to an AI, and any capable model will work with the constraints you set.
Prompt
Model asked: 'To give you better advice on the negotiation strategy, could you share the client's industry and approximate revenue?' Your response: 'Keep the industry as B2B logistics. Revenue is in the €200-500M range. Please work with those parameters rather than asking for more specifics — I'm keeping this anonymised intentionally.'
AI Response
This response does three things well: it provides enough context for the model to improve its output, it maintains the anonymisation discipline you established in the original prompt, and it signals clearly that further probing won't yield more identifying details. The model will respect the constraint. You stay in control of what leaves your environment, and the output remains genuinely useful.
Picture this: it's Thursday afternoon and your CFO asks for a quick competitive analysis before Friday's board meeting. You open ChatGPT, paste in three paragraphs from an internal strategy document, and type "summarize the competitive risks here." The output is crisp, exactly what you needed. What you didn't notice is that those three paragraphs contained unannounced product roadmap details, a named acquisition target, and your firm's internal margin benchmarks. That data is now part of an interaction log on OpenAI's servers, potentially used in future model training unless you've configured otherwise. The CFO gets her summary. The risk stays invisible — until it isn't.
Building Your Personal Data Filter
The core skill isn't memorizing a list of banned data types. It's developing an automatic mental checkpoint that fires before you hit send. Think of it as a two-second habit: before submitting any prompt to a public AI tool like ChatGPT, Claude, or Gemini, ask yourself whether a stranger reading this prompt would gain access to something your employer, client, or regulator would object to. If the answer is yes — or even maybe — you strip that content out and replace it with a fictional equivalent or a structural placeholder. This habit costs you almost nothing in time. The alternative can cost careers.
Stripping sensitive content doesn't mean stripping usefulness. The technique is substitution, not deletion. Replace a real client name with "Client A in the financial services sector." Replace actual revenue figures with "revenue in the $50M–$100M range." Replace a named acquisition target with "a mid-market SaaS company with 200 employees." The AI still has enough structural context to give you a high-quality output. You've described the shape of the problem without handing over the proprietary details. Most professionals who adopt this approach report that their AI outputs barely change in quality — because the model needs context and structure, not your actual secrets.
The Substitution Test
Choosing the Right Tool for the Sensitivity Level
Not all AI tools carry the same privacy risk profile. The tool you use for brainstorming taglines is not necessarily the right tool for drafting an HR performance review or analyzing a client contract. Enterprise-grade versions of these tools offer data processing agreements, opt-out of training on your data, and sometimes on-premise deployment. Knowing which tier of tool you're using — and what protections it provides — is as important as what you type into it. The table below maps common professional use cases against appropriate tool choices based on their data handling policies as of 2024.
| Use Case | Risk Level | Appropriate Tool | Why |
|---|---|---|---|
| Drafting marketing copy | Low | ChatGPT (free or Plus) | No sensitive data involved; public info only |
| Summarizing internal meeting notes | Medium | ChatGPT Enterprise or Claude for Work | Data not used for training; enterprise agreements in place |
| Analyzing client financial data | High | Microsoft Copilot (M365, with DPA) | Runs within your tenant; covered by existing data agreements |
| Reviewing HR or legal documents | High | On-premise LLM or approved enterprise tool only | Regulatory and confidentiality obligations require controlled environment |
| Coding and internal scripts | Medium–High | GitHub Copilot Business (with policy controls) | Code may contain credentials or proprietary logic; Business tier disables training on your code |
| Research and summarizing public sources | Low | Perplexity, ChatGPT, Gemini | Source material is already public; minimal risk |
Prompt
Here's our Q3 pipeline data for Acme Corp, TechStart Inc, and Pinnacle Financial. Total pipeline is $4.2M. Acme is close to signing a $1.1M deal. Write a board update summarizing our sales momentum.
AI Response
This prompt exposes three real client names, a total pipeline figure, and a specific deal value. If sent to a public AI tool without enterprise data protections, all of this becomes part of an external interaction. A data breach, a subpoena, or a future training data audit could surface this information.
Prompt
Here's our Q3 pipeline summary. Total pipeline is approximately $4M across three enterprise clients in financial services and tech. Our largest opportunity is close to closing at just over $1M. Write a board update summarizing our sales momentum.
AI Response
The AI produces an equally polished board update. No client names, no exact figures, no identifiable deal details. The output is professional, useful, and completely safe to generate on a public tool. The CFO gets what she needs. Nothing leaks.
What You Do Differently Starting Monday
The practical shift isn't dramatic. It's a two-second pause before every AI interaction. You build a short personal reference — a mental or written checklist — of data categories that always require substitution before they go into any public AI tool. Real client names. Internal financial figures. Employee names tied to performance. Unreleased product details. Legal case specifics. M&A targets. Once these are on your list, the habit of substituting them becomes automatic within a few weeks. You're not slowing down your AI use — you're making it sustainable and defensible.
For managers, the bigger opportunity is setting this standard for your team before someone makes an expensive mistake. A five-minute conversation about the substitution technique — with one concrete example from your own work — does more than any policy document. If your organization uses Microsoft 365 Copilot or ChatGPT Enterprise, confirm with IT or legal that your data processing agreements are active and that training opt-outs are configured. Don't assume these protections are on by default. In many cases, they require deliberate setup by an administrator.
Analysts and consultants carry a particular responsibility because they routinely handle client data under confidentiality agreements. Those agreements were written before generative AI existed. They almost certainly prohibit sharing client data with third-party systems without explicit consent — and a public AI tool qualifies as a third-party system. Before using any AI tool on client work, check your engagement agreement and your firm's AI use policy. If neither exists yet, that gap is itself a risk worth flagging to your leadership. Being the person who raised it early is a far better position than being the person who caused the incident.
Key Differences by Role
- Managers: Your primary job is setting team norms before an incident forces you to. Model the substitution habit visibly in team settings.
- Analysts: Every dataset you touch likely contains PII or commercially sensitive figures. Default to anonymization even when it feels unnecessary.
- Marketers: Low-risk on content creation, higher-risk when feeding in CRM data, customer segments, or unreleased campaign details. Keep these separate.
- Consultants: Your confidentiality obligations extend to AI tools. Treat any public AI tool as a third party — because legally, it is.
- HR professionals: Employee names, performance data, compensation figures, and disciplinary details are never appropriate for public AI tools — regardless of how anonymized they seem.
- Finance teams: Internal P&L figures, deal economics, and forecasts should only enter AI tools with confirmed enterprise data protections in place.
Goal: Produce a personalized, role-specific data safety reference card you can use and share immediately — not a generic policy, but a practical tool built around your actual work.
1. Open a blank document or note — this becomes your permanent reference card. 2. Write the heading: "Data I Always Substitute Before Using Public AI Tools." 3. List at least six specific data types from your actual job (e.g., "client names in financial services," "Q4 revenue forecast," "employee performance scores"). 4. Next to each item, write one substitution example — what you'd replace it with (e.g., "client name" → "a mid-market financial services firm"). 5. Add a second section: "Tools I'm Cleared to Use for Sensitive Work" — list any enterprise AI tools your organization has approved, and note whether a data processing agreement is confirmed. 6. Add a third section: "My Two-Second Check" — write a single question you'll ask yourself before every AI prompt (make it specific to your role). 7. Share the card with one colleague or your manager and ask if they'd add anything to the sensitive data list. 8. Save the card somewhere you'll actually see it — pinned in Notion, taped near your monitor, or saved as a phone note. 9. Set a calendar reminder for 30 days from now to review and update the card based on new tools or projects your team has adopted.
- Public AI tools like ChatGPT, Claude, and Gemini are not private by default — data you submit may be stored, reviewed, or used for training unless you're on an enterprise plan with specific protections configured.
- The substitution technique — replacing real names, figures, and identifiers with fictional equivalents — preserves AI output quality while eliminating data exposure risk.
- Enterprise tools like ChatGPT Enterprise, Microsoft 365 Copilot, and GitHub Copilot Business offer stronger data protections, but these must be confirmed as active — they are not always on by default.
- Confidentiality agreements and data protection regulations (GDPR, HIPAA, CCPA) apply to AI tools. Sharing client or employee data with a public AI tool likely violates both.
- The most effective privacy habit is a two-second pause before every prompt — long enough to ask whether a stranger reading this prompt would see something sensitive.
- Managers and team leads carry an outsized responsibility: setting visible norms and confirming enterprise tool configurations before an incident, not after.
A consultant pastes a client's financial projections into ChatGPT (free tier) to help structure a presentation. What is the most significant risk?
You need to use AI to draft a summary of internal HR performance reviews. Which approach is most appropriate?
Which of the following best describes the substitution technique for safe AI use?
Your organization has just deployed Microsoft 365 Copilot for all staff. A colleague assumes all data is now automatically protected. What should you tell them?
A marketing manager uses ChatGPT to brainstorm campaign taglines, then later uses the same ChatGPT session to paste in a spreadsheet of customer email addresses to 'check for duplicates.' What went wrong?
This lesson requires Pro
Upgrade your plan to unlock this lesson and all other Pro content on the platform.
You're currently on the Free plan.
