Take Control: Build Your Data Rules
Data Governance for AI Users
Part 1: What Happens When Your Team Uses AI Without Rules
Historical Record
Samsung
In early 2023, Samsung's semiconductor division experienced multiple data breaches when engineers pasted proprietary source code and internal technical documents into ChatGPT without authorization.
Samsung's incident became a widely recognized example of governance failures in AI adoption across professional organizations.
The Samsung story spread fast because it felt familiar. Managers recognized it. HR teams recognized it. Marketers, consultants, teachers, anyone who had started quietly using AI tools at work saw themselves in those engineers. The problem wasn't the tools. The problem was the absence of any shared understanding about what data belongs inside an AI prompt and what data should never leave the building. Samsung eventually developed an internal AI solution with tighter controls, but the ban cost them months of productivity and trust. The engineers weren't fired, they were just operating in a policy vacuum, doing what felt logical in the moment.
This is the defining challenge of AI adoption in professional environments right now. The tools are fast, capable, and genuinely useful. The instinct to paste in a client contract, a salary spreadsheet, or a patient record to get a faster answer is completely natural. But "natural" and "safe" are not the same thing. Data governance, the set of rules, habits, and decisions that determine how information is handled, has always existed in organizations. AI tools just broke the old fences without anyone noticing. This lesson is about building new ones, without killing the productivity gains that make AI worth using in the first place.
The Principle Behind the Story
What Samsung's situation teaches us isn't that AI tools are dangerous, it's that most organizations have an implicit data governance model that only works when information stays inside familiar systems. Email clients, CRMs, HR platforms, and document management tools all have access controls, audit logs, and terms of service that legal and IT teams have reviewed. When employees start routing work through ChatGPT, Claude, or Google Gemini on personal or unapproved accounts, that information moves outside those guardrails entirely. The principle is straightforward: data governance for AI isn't a new problem, it's an old problem wearing new clothes. The question is always the same, who can see this information, where does it go, and what happens to it after?
What "Data Governance" Actually Means at Work
How a Healthcare Team Learned This the Hard Way
Consider a mid-sized hospital network in the American Midwest, not Samsung, not a tech company, but a place where the stakes of data exposure are immediately obvious. A care coordination team started using ChatGPT to draft patient follow-up letters. The workflow was efficient: a coordinator would paste in notes from a patient's recent visit, ask ChatGPT to turn them into a warm, readable letter, then edit and send. The letters were better. Patients responded more. Coordinators saved roughly 40 minutes per day. Nobody told them to stop. Nobody told them to start. There was no policy.
The problem surfaced during a routine compliance review six months later. A HIPAA officer noticed the workflow and immediately flagged it. ChatGPT's standard consumer tier, the free version and even ChatGPT Plus, does not sign a Business Associate Agreement (BAA), which is the legal contract required before any vendor can handle Protected Health Information (PHI) in the United States. The hospital hadn't done anything malicious. But it had potentially violated HIPAA every time a coordinator pasted patient details into the tool, which had happened hundreds of times. The team had to stop the workflow entirely while legal reviewed the exposure. Months of productivity gains evaporated overnight.
The resolution wasn't complicated once they understood the landscape. Microsoft Azure OpenAI Service, which provides access to the same GPT models that power ChatGPT, does offer HIPAA-compliant configurations with BAAs for enterprise customers. Microsoft Copilot for Microsoft 365, when deployed through an enterprise agreement, also meets these requirements. The hospital's IT team set up an approved environment within eight weeks. The care coordinators got their workflow back. But the months of uncertainty, the legal review costs, and the compliance risk exposure were entirely avoidable with a basic data governance policy established from the start. The tool wasn't the problem. The absence of rules was.
Prompt
I need to draft a warm, professional follow-up letter for a patient who recently had a knee replacement surgery. They are 68 years old, the surgery went well, and we want to remind them about their 6-week check-up appointment and encourage them to keep up with their physical therapy exercises. Do not include any real names, dates, or identifying information. I will fill those in manually. Tone should be caring and clear, suitable for an older adult reader.
AI Response
Subject: Your Recovery Is Going Well. A Note from Our Care Team Dear [Patient Name], We wanted to reach out and let you know how pleased your care team is with how your recovery is progressing. Knee replacement surgery is a significant step, and the work you've put in during these early weeks truly makes a difference in your long-term outcomes. As a reminder, your 6-week follow-up appointment is coming up soon. This visit is an important milestone, it gives your surgeon a chance to review your progress, check on healing, and answer any questions you may have. Please make sure this appointment is on your calendar. If you need to reschedule, don't hesitate to call our office. In the meantime, please continue with your physical therapy exercises as prescribed. We know they can feel challenging, but consistency at this stage is one of the most important things you can do for your recovery. Your physical therapist is a great resource if any of the exercises feel uncomfortable or unclear. Don't hesitate to reach out if you have any questions or concerns before your appointment. We're here to support you every step of the way. Warm regards, [Care Coordinator Name] [Hospital/Clinic Name] [Contact Information]
The Placeholder Technique
The Marketing Agency That Built the Right Habits Early
Not every story ends in a compliance scramble. A boutique marketing agency in London, about 35 people, working primarily with financial services clients, decided to get ahead of the problem in late 2022, before AI tools had become ubiquitous in their workflow. The agency's managing director had read early coverage of ChatGPT and immediately saw the opportunity, but also the exposure. Their financial services clients had strict NDAs. Campaign briefs contained market positioning data, budget allocations, and competitive intelligence that clients considered highly sensitive. If that material ended up in an AI training dataset, the agency's client relationships, and potentially its legal standing, would be at serious risk.
The MD spent one afternoon writing a one-page AI usage policy. It wasn't a legal document. It was a practical guide: which tools were approved (Claude Pro with privacy settings enabled, Canva AI for design work), what categories of information were off-limits (client names, budget figures, campaign performance data, anything under NDA), and what the placeholder technique looked like in practice. She shared it in a team meeting, answered questions for 20 minutes, and pinned it to the agency's Notion workspace. Within a month, every account manager was using Claude to draft client proposals faster, every designer was using Canva AI to generate concept visuals, and not a single piece of client-sensitive data had been exposed. The policy cost her an afternoon. The productivity gains were immediate and sustained.
Comparing the Approaches: What Separates Safe AI Use from Risky AI Use
| Scenario | Tool Used | Data Involved | Governance in Place? | Outcome |
|---|---|---|---|---|
| Samsung engineers debug code | ChatGPT (consumer) | Proprietary source code, internal docs | No policy | Company-wide ban, months of disruption |
| Hospital care coordinators draft letters | ChatGPT Plus (consumer) | Patient visit notes (PHI) | No policy | HIPAA compliance risk, workflow suspended |
| Marketing agency drafts proposals | Claude Pro (privacy enabled) | Placeholder text only, no real client data | One-page policy, approved tools list | Productivity gains, zero exposure incidents |
| HR manager writes job descriptions | Microsoft Copilot (M365 enterprise) | Internal role requirements, no personal data | Enterprise agreement in place | Compliant, efficient, fully auditable |
| Consultant summarizes meeting notes | ChatGPT Plus (consumer, no enterprise plan) | Client meeting transcript with names and strategy | No policy | Potential NDA breach, client relationship risk |
What a High School Principal Figured Out That Most Executives Haven't
Shift industries entirely. A high school principal in Auckland, New Zealand, noticed that her teachers had started using ChatGPT to write report card comments, draft parent communication letters, and generate differentiated lesson plans. She wasn't opposed to any of it, the time savings were real, and the quality of written communication had visibly improved. But she had a specific concern: some teachers were pasting in student names, learning difficulties, behavioral notes, and academic performance data to get more personalized outputs. Under New Zealand's Privacy Act 2020, student information is protected personal data. The school had no legal basis for routing it through a third-party AI system without parental consent and a proper data processing agreement.
Her solution was elegant and didn't require any technical expertise. She created a simple two-column reference sheet. The left column was labeled "Safe to include", things like grade level, subject area, general learning objectives, and tone preferences. The right column was labeled "Never include", student names, student ID numbers, special education designations, behavioral incident details, and anything from a student's official file. She laminated it and put it next to every classroom computer. Teachers kept their AI-assisted workflows. The school stayed compliant. Parents weren't alarmed. The principal had done in two hours what most corporate legal teams spend months trying to formalize: she translated a complex compliance requirement into a practical, visual decision tool that non-technical people could actually use.
Build Your Own Two-Column Reference Sheet
Why the Default Settings on Consumer Tools Don't Protect You
One of the most important things to understand about popular AI tools is that their default configurations are designed for individual users, not for organizational data protection. ChatGPT's free tier, by default, uses your conversations to improve its models unless you actively opt out in settings. Google Gemini, when accessed through a personal Gmail account rather than a Google Workspace enterprise account, similarly retains data according to Google's consumer privacy policy rather than enterprise data processing terms. This distinction matters enormously. Consumer policies are written for individuals sharing personal information. Enterprise agreements are written for organizations handling client data, employee records, and proprietary business information, and they come with contractual commitments about data retention, access, and security.
The good news is that enterprise-grade versions of the most popular tools do exist, and many organizations already have access to them without realizing it. If your company uses Microsoft 365. Outlook, Word, Excel, Teams, and has an enterprise license, Microsoft Copilot is likely available to you under data protection terms that are far more robust than the consumer version of any competing tool. If your organization uses Google Workspace for Business or Enterprise, Google Gemini within that environment operates under Google's enterprise data processing addendum, which is a very different legal arrangement than a personal Gmail account. The tool on the screen may look identical. The data handling underneath it is completely different.
This is the single most underappreciated point in workplace AI adoption. Employees using a personal ChatGPT Plus account, even a paid one, for work tasks are operating outside their organization's data governance framework, regardless of how careful they are with what they type. The account isn't connected to the organization's security policies, audit logs, or legal agreements. If something goes wrong, the organization has no visibility and no recourse. By contrast, an employee using Microsoft Copilot through a properly configured Microsoft 365 enterprise account is working inside a governed environment where the organization retains control over data, can audit usage, and has contractual protections with Microsoft. Same AI capability. Completely different risk profile. The choice of account matters as much as the choice of tool.
Putting This Into Practice: Your First Governance Action
Goal: Produce a practical, one-page reference document that your team can use immediately to determine what information is safe to include in AI tool prompts, without needing to consult legal or IT every time.
1. Open a blank document in Microsoft Word, Google Docs, or Notion, whichever your team uses daily. Title it 'AI Prompt Data Guidelines, [Your Team Name]' and add today's date. 2. Create a two-column table with the headers 'Safe to Include in AI Prompts' and 'Never Include in AI Prompts.' Make the table large enough to hold at least 10 items in each column. 3. In the 'Safe to Include' column, list at least 8 specific types of information relevant to your role, for example: general project descriptions, publicly available competitor information, anonymized customer personas, internal process steps, draft copy for review, meeting agenda topics. 4. In the 'Never Include' column, list at least 8 specific types of sensitive information, for example: client names and contract values, employee salary data, patient or student records, legal correspondence, proprietary financial forecasts, personal identification numbers. 5. Add a third row at the bottom of each column labeled 'When in doubt...', in the Safe column write 'Use a placeholder like [Client Name] or [Budget Amount],' and in the Never column write 'Contact your manager or IT before proceeding.' 6. Below the table, add a short section titled 'Approved Tools for This Team' and list the AI tools your organization has formally approved, along with any account-type requirements (e.g., 'Microsoft Copilot via M365 enterprise account only, not personal accounts'). 7. Share the document with your immediate team and ask each person to add one item to each column based on their specific role, this takes 5 minutes and makes the guide more comprehensive and more trusted. 8. Pin or post the document somewhere your team will see it regularly, a shared Notion page, a Teams channel, a printed sheet near workstations, or a pinned message in your team's Slack channel. 9. Set a calendar reminder for 60 days from now to review and update the guide, noting any new tools your team has started using or any new types of sensitive data that have come up in your work.
Key Lessons from Part 1
- Data governance for AI isn't a new problem, it's the same question of who can see sensitive information, applied to new tools that most organizations haven't yet built rules around.
- The Samsung, hospital, and consultant examples share a common root cause: capable employees using useful tools in a policy vacuum, not malicious intent.
- The type of account matters as much as the tool itself, consumer accounts (personal ChatGPT Plus, personal Gmail/Gemini) operate under very different data terms than enterprise accounts (Microsoft Copilot via M365, Google Gemini via Workspace Business/Enterprise).
- The placeholder technique, replacing real sensitive details with labeled brackets like [Client Name] or [Budget Figure], is the single most practical habit for protecting sensitive data while preserving AI productivity gains.
- Non-technical professionals can create effective data governance tools. The Auckland principal's laminated two-column sheet is more useful in practice than most corporate AI policies that run to 40 pages.
- Industry-specific regulations (HIPAA for healthcare, GDPR for EU personal data, FERPA for US student records) apply to AI tools just as they apply to email and cloud storage, the technology being new does not create a compliance exemption.
- The first governance action for any team is a simple, visible reference document that answers one question in under 10 seconds: 'Can I put this in an AI prompt?'
When Good Intentions Create Data Breaches
In early 2023, a mid-sized consulting firm in Chicago, let's call them Meridian Advisory, was under pressure to deliver faster client reports. Their senior analyzts started using ChatGPT to summarize meeting notes, draft findings, and generate slide copy. Nobody told them not to. Nobody told them how, either. Within six weeks, three analyzts had pasted full client financial models into the chat window, one had submitted a competitor analyzis containing a client's unreleased acquisition plans, and another had uploaded an HR restructuring memo that named specific employees. No malicious intent. No rogue actors. Just busy professionals using a powerful tool without a framework to guide them.
Meridian's managing partner only discovered the problem when a client asked, during a contract renewal meeting, whether their data was being used to train AI systems. The partner didn't know the answer. That silence cost them the contract, a $400,000 annual retainer. The client's legal team had been monitoring AI governance disclosures after a competitor's data appeared in a ChatGPT response (a well-documented phenomenon from early 2023). Meridian hadn't broken any laws. But they had broken trust, and in professional services, trust is the entire product.
What Meridian experienced is now textbook in AI governance circles: the gap between tool availability and tool governance. The analyzts weren't negligent, they were efficient. The problem was structural. No one had defined what data could go where, which tools were approved for which tasks, or what the firm's obligations were to clients whose data entered a third-party AI system. The principle that emerges from their story is blunt: access to an AI tool is not the same as permission to use it with any data you have.
The Default Data Assumption
Data Classification: The System That Makes Rules Manageable
Meridian's real failure wasn't using AI, it was having no data classification system. Data classification is simply a way of labeling information by how sensitive it is and what handling rules apply. Think of it like the color-coded folders some offices use: red means restricted, yellow means internal only, green means public. When your team knows which category their data falls into before they open an AI tool, the decision about what to paste or upload becomes almost automatic. The folder color tells you what to do.
Most professional organizations operate with roughly four levels of data sensitivity, even if they've never formally named them. Public data is anything you'd post on your website: press releases, job listings, general product descriptions. Internal data is everyday business information, meeting agendas, project timelines, internal process documents, that isn't secret but isn't for outsiders. Confidential data includes client information, financial details, contracts, employee records, and strategic plans. Restricted data is the most sensitive tier: personally identifiable information (PII), health records, legal documents under privilege, regulated financial data, and anything with explicit contractual confidentiality obligations.
Once you have these categories, you can build simple rules that anyone on your team can follow without a law degree. Public data? Use any AI tool freely. Internal data? Use approved tools with standard caution. Confidential data? Use only enterprise-tier tools with data processing agreements, or don't use AI at all for that specific task. Restricted data? Treat it like cash, manual processes only, or purpose-built secure environments. This isn't bureaucracy. It's the same logic as locking your filing cabinet. The AI era just made the rules more urgent.
Prompt
I manage a 12-person marketing team at a B2B software company. We want to start using AI tools like ChatGPT and Claude for content creation and campaign analyzis. I need a simple, one-page data classification guide that tells my team what kinds of information they can and cannot put into AI tools. We work with customer data, campaign performance metrics, competitor research, and internal strategy documents. Write the guide in plain language, no legal jargon. Include four data levels, examples of each, and a clear rule for each level about AI tool use.
AI Response
**Marketing Team AI Data Guide. What Goes In, What Stays Out** **Level 1. Public (Green Light)** Examples: Blog post drafts, product descriptions, published case studies, social media copy, general industry research. AI Rule: Use any approved tool freely. This is what AI tools are built for. **Level 2. Internal (Yellow Light)** Examples: Campaign calendars, internal meeting notes without client names, team process documents, general performance benchmarks. AI Rule: Use approved tools (ChatGPT Plus, Claude Pro, Copilot). Remove names and specific company references before pasting. Don't share outputs outside the team. **Level 3. Confidential (Orange Light)** Examples: Customer names and contact lists, campaign ROI tied to specific accounts, competitor intelligence reports, budget figures, contract terms. AI Rule: Use only enterprise tools with a signed data processing agreement (ChatGPT Enterprise, Microsoft Copilot for M365, Claude for Enterprise). Never use free or personal-tier accounts. When in doubt, ask your manager. **Level 4. Restricted (Red Light)** Examples: Customer PII (emails, phone numbers, addresses), payment data, any data covered by GDPR or CCPA, legal agreements. AI Rule: Do not enter this data into any AI tool. Full stop. Use manual processes or consult IT and legal before proceeding. **Quick Test:** Before you paste anything into an AI tool, ask: "Would I be comfortable if our biggest customer saw exactly what I'm submitting?" If the answer is no, move up a level.
Healthcare Gets It Right. And Shows Everyone Else the Way
The healthcare sector has operated under strict data governance rules since HIPAA became law in 1996. So when AI tools arrived, hospital administrators and clinical teams already had mental muscles for asking: "Is this data protected? Who can see it? Where does it go?" That habit gave them a structural advantage. Atrium Health, a large hospital network in the Carolinas, began piloting AI writing tools for administrative staff in 2022, drafting patient communication letters, summarizing policy documents, generating training materials. Their governance framework was already in place: protected health information (PHI) never left approved, HIPAA-compliant systems. AI tools were evaluated against that standard before anyone touched them.
The lesson for non-healthcare professionals isn't "follow HIPAA", it's "build the habit of asking the governance question before the tool question." Atrium's staff didn't ask "Can AI help me write this?" first. They asked "What kind of data does this task involve?" first. That sequencing change is everything. A sales manager drafting a proposal doesn't need to know HIPAA. But they do need to ask whether the client details they're about to paste into Gemini are covered by a non-disclosure agreement, and whether Gemini's terms allow that. Same mental habit, different context.
Comparing AI Tools by Data Governance Strength
| AI Tool | Free/Consumer Tier. Data Used for Training? | Paid Tier Data Protection | Enterprise Option Available? | Best For |
|---|---|---|---|---|
| ChatGPT (OpenAI) | Yes by default; opt out in settings | ChatGPT Plus: opt-out available; data not used if opted out | Yes. ChatGPT Enterprise; zero data retention, SOC 2 compliant | Content creation, analyzis, drafting |
| Claude (Anthropic) | Free tier: conversations may be reviewed | Claude Pro: stronger protections; review their current terms | Yes. Claude for Enterprise; no training on your data | Long documents, nuanced writing, policy work |
| Microsoft Copilot (M365) | Consumer: standard Microsoft terms | Copilot for M365: data stays within your Microsoft tenant | Yes, built into M365 E3/E5 enterprise plans | Office workflows, Teams, Outlook, Excel |
| Google Gemini | Consumer: data used per Google's terms | Gemini for Workspace: data not used to train models | Yes. Google Workspace enterprise tiers | Gmail, Docs, Sheets integration |
| Notion AI | Included in paid Notion plans | Data handled per Notion's privacy terms; no AI training on your data per their policy | Yes. Notion for Enterprise with BAA available | Knowledge management, internal docs |
A Recruiting Manager's Lesson in Unintended Exposure
Consider a talent acquisition manager at a regional logistics company. She was using Claude Pro to help screen resumes and draft interview feedback summaries. Smart use of AI, time-consuming administrative work, clear efficiency gains. The problem emerged when she started pasting full candidate profiles, names, addresses, employment history, salary expectations, directly into her prompts. She wasn't thinking about data classification. She was thinking about getting through 40 applications before Friday. Her intentions were entirely professional. Her data handling was not.
Candidate data is among the most sensitive information a company holds. In many jurisdictions, it's covered by privacy regulations: GDPR in Europe, CCPA in California, and a growing patchwork of state-level laws in the US. When she pasted those profiles into a standard paid-tier AI tool, she potentially created a compliance exposure her company's legal team didn't know existed. The fix was straightforward once identified: use anonymized summaries ("Candidate A: 8 years logistics experience, seeking $85K") instead of real names and details. Same AI benefit, zero data risk. The principle, anonymize before you analyze, became a standing rule for her entire HR team.
The Anonymization Shortcut
Building a Practical Governance Habit for Your Team
Governance frameworks fail when they live in PDFs that nobody reads. The organizations that actually protect themselves are the ones that build governance into the moment of use, the second before someone presses send or paste. This means making the right behavior the easy behavior. A one-page cheat sheet on every desk. A Slack pinned message with the four data levels. A 10-minute onboarding conversation the first time someone joins a team that uses AI tools. These aren't bureaucratic hurdles, they're the professional equivalent of washing your hands before surgery. Basic, fast, and catastrophically important.
The most effective teams also establish a simple escalation path. When someone isn't sure whether data is safe to use with an AI tool, they need to know exactly who to ask and how fast they'll get an answer. If the answer takes three days, people will skip the question and take their chances. A designated "AI tool owner", often a tech-savvy team member, an operations manager, or an IT liaison, who can answer governance questions within a few hours makes the whole system work. This person doesn't need to be a data scientist. They need to know the company's tool approvals, the data classification tiers, and who in legal or IT to escalate genuinely complex questions to.
There's also the matter of outputs, not just inputs. When an AI tool generates a report, a client email, or a hiring recommendation, that output may contain inferred information about real people or real business situations. Sharing that output externally, forwarding it to a client, publishing it, including it in a legal filing, carries its own governance obligations. Did the AI hallucinate a fact that's now in a client proposal? Did it reproduce a sentence that sounds like it came from a competitor's proprietary document? Reviewing AI outputs before they leave your organization isn't just quality control. It's governance at the output layer, and it's just as important as governance at the input layer.
Goal: Create a practical, one-page data governance reference your team can use immediately when deciding what information is safe to put into AI tools.
1. Open a blank document in Word, Google Docs, or Notion, wherever your team already works and will actually see it. 2. Write a title at the top: '[Your Team Name] AI Tool Data Guide' and add today's date so people know it's current. 3. Create four clearly labeled sections: Public, Internal, Confidential, and Restricted. Use color coding if your tool supports it (green, yellow, orange, red). 4. Under each section, write 3-5 examples of real data your specific team handles, use your actual job titles, document types, and client situations, not generic placeholders. 5. Under each section, write one clear rule: which AI tools are approved for that data level, and what precautions apply (e.g., 'anonymize names before pasting'). 6. Add a 'When In Doubt' box at the bottom that names one specific person on your team or in your organization who can answer AI data questions, include their name and preferred contact method. 7. Share a draft with one colleague and ask them to find one scenario from their actual workday that the guide doesn't clearly cover. Use their answer to add one more example. 8. Save the final version somewhere your whole team can find it in under 30 seconds, a pinned Slack message, a shared drive folder, or a Notion page linked from your team homepage. 9. Set a calendar reminder for 90 days from now to review the guide and update it based on any new tools your team has adopted or new data types you're handling.
Key Principles from This Section
- Access to an AI tool is not the same as permission to use any data you have with that tool, those are two separate decisions.
- Data classification gives your team a decision-making shortcut: know the category, know the rule, skip the guesswork.
- The four practical tiers. Public, Internal, Confidential, Restricted, apply to virtually every professional context, regardless of industry.
- Consumer and enterprise tiers of the same AI tool have meaningfully different data protections. The tier your team uses matters legally and contractually.
- Anonymizing data before it enters an AI tool eliminates the most common source of accidental exposure without reducing the quality of AI assistance.
- Governance at the output layer is as important as governance at the input layer, review what the AI produces before it leaves your organization.
- The best governance systems are frictionless: one-page guides, named point people, fast escalation paths, and built-in habits rather than policies buried in handbooks.
What to Carry Forward
Every professional using AI tools today is making data governance decisions, whether they know it or not. The Meridian Advisory story, the recruiting manager's candidate profiles, the hospital network that already had its framework in place, these aren't cautionary tales about technology. They're stories about professional judgment in a new environment. The tools are fast and genuinely useful. The governance habits that protect your clients, your colleagues, and your organization don't have to be slow or complicated. They just have to exist, clearly, accessibly, and in the hands of everyone who opens a chat window.
Building a Data Governance Habit That Actually Sticks
In 2023, Samsung's semiconductor division made headlines for the wrong reasons. Within weeks of granting employees access to ChatGPT, engineers had pasted proprietary chip design code and internal meeting notes directly into the tool. Three separate incidents. The data left Samsung's control the moment it hit OpenAI's servers. Samsung's response was swift, they banned ChatGPT internally and began building a private AI system. But the damage was done, and the lesson was stark: when you give professionals a powerful tool without clear rules, they use it in the most efficient way they know. Nobody was trying to cause harm. They were just trying to do their jobs faster.
What Samsung experienced wasn't a technology failure. It was a governance failure. The company had no policy telling employees what could and couldn't be shared with external AI tools. There was no training, no classification system for sensitive data, no moment where someone asked: 'Before you paste that, does this data belong in a third-party system?' The engineers acted rationally inside a vacuum of guidance. This is the core tension in AI data governance, the tools are frictionless by design, and professional habits don't automatically account for where data goes once you hit send.
The principle Samsung's story gives us is uncomfortable but useful: governance gaps get filled by individual judgment, and individual judgment under deadline pressure almost always prioritizes speed. That's not a character flaw, it's human nature. The fix isn't surveillance or bans. It's building simple, memorable rules that professionals can apply in the moment, without needing a lawyer or an IT ticket.
What Data Governance Actually Means for AI Users
The Nurse, the Spreadsheet, and the Patient Who Never Consented
A nursing home administrator in the UK, let's call her Claire, was drowning in staffing reports. She discovered that pasting shift data into ChatGPT could generate a readable weekly summary in seconds, saving her nearly two hours every Friday. For three months, she did exactly that. The spreadsheets included staff names, patient-to-staff ratios by ward, and timestamps that could be cross-referenced to identify which patients received reduced care on which days. Claire wasn't sharing patient names directly. But the combination of data was enough to be considered personal health information under UK GDPR. She hadn't done anything malicious. She had done something efficient, inside a governance vacuum.
Claire's story illustrates what privacy experts call the aggregation problem, individually harmless data points that become sensitive when combined. A ward name isn't sensitive. A shift time isn't sensitive. A care ratio isn't sensitive. Together, they paint a picture that could identify patients and expose the facility to regulatory risk. The UK Information Commissioner's Office has issued guidance specifically warning healthcare administrators about this pattern. Claire's facility eventually implemented a simple rule: no spreadsheet containing operational data tied to care delivery goes into a public AI tool. They adopted Microsoft Copilot inside their existing Microsoft 365 environment instead, where data stayed within their tenancy.
The lesson Claire's case teaches isn't 'don't use AI.' It's 'know your data before you share it.' Healthcare is an extreme example because the regulations are explicit, but the aggregation problem exists in every industry. A sales manager sharing pipeline data, a teacher uploading student performance spreadsheets, a consultant pasting client financials, all of them can create exposure through combination, even when no single data point looks dangerous on its own.
Prompt
I manage an HR team of 8 people at a mid-sized financial services company. We handle employee data including salaries, performance reviews, disciplinary records, and recruitment information. I want to create a simple one-page policy that tells my team what they can and cannot share with AI tools like ChatGPT or Claude. Make it practical, not legal jargon. Use plain language. Include a quick-reference checklist they can keep at their desk.
AI Response
**HR Team AI Usage Policy. Quick Reference** **What you can share with AI tools (ChatGPT, Claude, etc.):** - Job description drafts (before posting, no internal salary bands) - Generic email templates and communication frameworks - Training content outlines and onboarding checklists - Policy document drafts that contain no employee-specific information - Interview question banks and competency frameworks **What you must never share with external AI tools:** - Employee names combined with salary, performance, or disciplinary data - Recruitment data that identifies specific candidates - Any document containing national insurance numbers, dates of birth, or home addresses - Grievance or investigation records - Anything marked CONFIDENTIAL or RESTRICTED **The 10-Second Check (ask before you paste):** 1. Does this contain a real person's name + any sensitive detail? → Don't share. 2. Could someone identify an employee from this data, even without a name? → Don't share. 3. Would I be comfortable if this appeared in a data breach report? → If no, don't share. **Approved tools for sensitive HR work:** Microsoft Copilot within our M365 environment only. **When in doubt:** Remove all names and identifiers, use placeholder text (Employee A, Department X), then share the anonymized version. Questions? Contact the Data Protection Lead before using AI on any unfamiliar data type.
How a Marketing Agency Turned a Crisis Into a Framework
A boutique marketing agency in Toronto discovered that three account managers had been using Claude and ChatGPT to draft client campaign strategies, pasting in client briefs, audience research, and competitive analyzis that were explicitly marked confidential under their client contracts. The agency found out when a client asked why elements of their 'proprietary strategy' had appeared in a competitor's campaign six months later. Direct causation was impossible to prove, but the client terminated the contract. The agency lost $340,000 in annual revenue and spent months rebuilding trust across their remaining client base.
What the agency built afterward became their competitive differentiator. They created a three-tier data classification system. Green (freely shareable), Amber (shareable with anonymization), Red (internal tools only), and trained every account manager on it in a single 90-minute workshop. They switched client-facing AI work to Claude Pro's Projects feature, which doesn't use conversations for training by default, and documented this in client contracts as a trust signal. Within a year, they were winning pitches partly because of their 'AI-safe client data' policy. The crisis became the credential.
| Data Type | Examples | Safe for Public AI Tools? | Recommended Approach |
|---|---|---|---|
| Green. Public / Generic | Industry trends, job description templates, general frameworks | Yes | Use freely in ChatGPT, Claude, Gemini |
| Amber. Internal / Operational | Team processes, anonymized reports, draft policies | With caution | Remove names/identifiers first; use enterprise tools where possible |
| Red. Confidential / Personal | Client contracts, employee records, financial data, patient info | No | Use only in approved internal tools (Copilot M365, private deployments) |
The Teacher Who anonymized Everything and Slept Better
Maria teaches Year 9 English at a secondary school in Melbourne. She started using ChatGPT to generate differentiated feedback comments for student essays, a task that previously consumed four hours every Sunday. Her first instinct was to paste in the essays directly, with student names intact. A colleague mentioned FERPA-equivalent concerns under Australian privacy law, and Maria paused. She developed a simple substitution habit: before pasting any student work, she replaces the student's name with a code (S1, S2, S3) in a private notepad, generates the AI feedback, then matches it back to the real student in her own records. The whole substitution process takes 90 seconds per class set.
Maria's approach is low-tech, takes minimal training, and requires no new software. It's the kind of practical governance habit that scales because it costs almost nothing to implement. Her school has since adopted her method as a recommended practice for all teachers using AI tools for student feedback. The principle she embodies: you don't need a data governance committee to make safe choices. You need one clear rule and the habit of applying it before you type.
The 'Newspaper Test' for AI Data Sharing
Making Governance Real in Your Daily Workflow
Data governance sounds like something that lives in a policy document nobody reads. The professionals who do it well have made it invisible, a 10-second mental check that happens before every AI interaction, like looking both ways before crossing a street. The three-tier classification system (Green, Amber, Red) is a starting point, not a bureaucratic requirement. You can apply it right now, today, without any approval from IT or legal. Look at what you typically paste into AI tools and mentally sort it. Most of it is probably Green. A small portion is Amber. A very small portion is Red. That awareness alone changes behavior.
Tool choice matters too. ChatGPT Free and the standard Claude.ai interface may use your conversations to improve their models unless you opt out in settings, this is documented in both companies' privacy policies and is easy to disable. ChatGPT Plus users can turn off training in Settings → Data Controls. Claude Pro doesn't use conversations for training by default. Microsoft Copilot within a work Microsoft 365 account keeps data within your organization's tenancy, it never leaves to train a public model. Google Workspace's Gemini operates under Google's enterprise data protection terms when accessed through a paid Workspace account. Knowing which tool you're in, and what its data terms are, is a one-time 15-minute investment that changes every interaction afterward.
The final shift is cultural, not technical. Samsung's ban didn't solve the underlying problem, it just pushed employees toward workarounds. The agencies, healthcare administrators, and teachers who handle AI data well didn't get there through restriction. They got there through clarity. Clear rules, applied consistently, explained in plain language, reviewed when tools change. That's data governance at the professional level. No IT degree required.
Goal: Create a practical, one-page reference document that classifies your typical work data and tells you, clearly, what's safe to share with AI tools and what isn't.
1. Open a blank document in Word, Google Docs, or Notion. Title it 'My AI Data Sharing Rules, [Your Name/Team].' 2. List 8-10 types of data you regularly work with (e.g., client emails, budget spreadsheets, employee feedback, meeting notes, customer names, product specs). 3. Next to each data type, write one of three labels: GREEN (safe to share as-is), AMBER (safe after anonymizing), or RED (internal tools only, never public AI). 4. For each AMBER item, write one sentence describing how you would anonymize it before sharing (e.g., 'Replace client name with CLIENT A, remove contract values'). 5. Open your main AI tool (ChatGPT, Claude, or Gemini) and go to its privacy/data settings. Find and screenshot the setting that controls whether your conversations are used for model training. Note whether it is currently on or off. 6. Write one sentence at the top of your document stating which AI tools are approved for your work context and which require caution (check with your manager if unsure). 7. Share a draft of your checklist with one colleague and ask them to add any data types you missed from your shared workflow. 8. Save the final version somewhere you'll see it, pin it in your browser bookmarks, add it to your Notion workspace, or print it and keep it at your desk. 9. Set a calendar reminder for 90 days from now to review and update the checklist as your AI tool usage evolves.
- Governance gaps get filled by individual judgment under pressure, clear rules prevent the Samsung problem before it starts.
- The aggregation problem means individually harmless data points can become sensitive when combined, always consider what your data reveals as a set, not just as individual fields.
- A three-tier classification system (Green / Amber / Red) gives any professional a decision framework they can apply in seconds, without legal training.
- anonymization is the practical middle ground, removing or replacing names and identifiers lets you use AI freely on Amber-category data without creating real exposure.
- Tool choice is a governance decision. ChatGPT Free, Claude Pro, Microsoft Copilot, and Google Workspace Gemini all have different data handling terms that materially affect what's safe to share where.
- Opt-out settings exist and take 60 seconds to configure, disabling conversation training in ChatGPT or confirming Claude Pro's default policy is a one-time action with lasting impact.
- Governance works best when it's cultural, not restrictive, teams that explain the 'why' behind data rules get better compliance than teams that issue bans.
Key Takeaways
- Data governance for AI users is not an IT function, it's a professional habit that every person using AI tools needs to develop independently.
- Before sharing anything with an AI tool, apply the 10-second check: Does this contain real people's data? Could it expose confidential information? Would it pass the newspaper test?
- Classify your work data into three tiers. Green, Amber, Red, and treat each tier differently based on the sensitivity of what it reveals.
- anonymize Amber-category data before using public AI tools. Replace names with codes, remove identifying numbers, and use placeholder text.
- Know your tool's data terms. Enterprise versions of Copilot, Gemini Workspace, and Claude Pro offer stronger protections than free consumer tiers.
- The professionals who handle AI data governance best aren't the most technically sophisticated, they're the ones with the clearest, simplest rules applied most consistently.
Featured Reading
This lesson requires Pro
Upgrade your plan to unlock this lesson and all other Pro content on the platform.
You're currently on the Free plan.
