Lesson 7 of 8

Take Control: Build Your Data Rules

~26 min readLast reviewed May 2026

This lesson counts toward:Build Fair AI Systems: A Safety Guide Using AI Responsibly Legal Work, Faster: AI for High-Stakes Practice

Data Governance for AI Users

Part 1: What Happens When Your Team Uses AI Without Rules

2023

Historical Record

Samsung

In early 2023, Samsung's semiconductor division experienced multiple data breaches when engineers pasted proprietary source code and internal technical documents into ChatGPT without authorization.

Samsung's incident became a widely recognized example of governance failures in AI adoption across professional organizations.

The Samsung story spread fast because it felt familiar. Managers recognized it. HR teams recognized it. Marketers, consultants, teachers, anyone who had started quietly using AI tools at work saw themselves in those engineers. The problem wasn't the tools. The problem was the absence of any shared understanding about what data belongs inside an AI prompt and what data should never leave the building. Samsung eventually developed an internal AI solution with tighter controls, but the ban cost them months of productivity and trust. The engineers weren't fired, they were just operating in a policy vacuum, doing what felt logical in the moment.

This is the defining challenge of AI adoption in professional environments right now. The tools are fast, capable, and genuinely useful. The instinct to paste in a client contract, a salary spreadsheet, or a patient record to get a faster answer is completely natural. But "natural" and "safe" are not the same thing. Data governance, the set of rules, habits, and decisions that determine how information is handled, has always existed in organizations. AI tools just broke the old fences without anyone noticing. This lesson is about building new ones, without killing the productivity gains that make AI worth using in the first place.

The Principle Behind the Story

What Samsung's situation teaches us isn't that AI tools are dangerous, it's that most organizations have an implicit data governance model that only works when information stays inside familiar systems. Email clients, CRMs, HR platforms, and document management tools all have access controls, audit logs, and terms of service that legal and IT teams have reviewed. When employees start routing work through ChatGPT, Claude, or Google Gemini on personal or unapproved accounts, that information moves outside those guardrails entirely. The principle is straightforward: data governance for AI isn't a new problem, it's an old problem wearing new clothes. The question is always the same, who can see this information, where does it go, and what happens to it after?

What "Data Governance" Actually Means at Work

Data governance is simply the set of decisions about who can access information, how it gets used, and what protections are in place. For AI tools specifically, it means knowing: (1) whether the tool stores your inputs, (2) whether those inputs could be used to train AI models, (3) who at the AI company can see what you typed, and (4) whether your organization's legal or compliance obligations. GDPR, HIPAA, SOC 2, etc. , are satisfied. You don't need to be a lawyer or IT specializt to understand these basics. You just need to ask the right questions before you paste.

How a Healthcare Team Learned This the Hard Way

Consider a mid-sized hospital network in the American Midwest, not Samsung, not a tech company, but a place where the stakes of data exposure are immediately obvious. A care coordination team started using ChatGPT to draft patient follow-up letters. The workflow was efficient: a coordinator would paste in notes from a patient's recent visit, ask ChatGPT to turn them into a warm, readable letter, then edit and send. The letters were better. Patients responded more. Coordinators saved roughly 40 minutes per day. Nobody told them to stop. Nobody told them to start. There was no policy.

The problem surfaced during a routine compliance review six months later. A HIPAA officer noticed the workflow and immediately flagged it. ChatGPT's standard consumer tier, the free version and even ChatGPT Plus, does not sign a Business Associate Agreement (BAA), which is the legal contract required before any vendor can handle Protected Health Information (PHI) in the United States. The hospital hadn't done anything malicious. But it had potentially violated HIPAA every time a coordinator pasted patient details into the tool, which had happened hundreds of times. The team had to stop the workflow entirely while legal reviewed the exposure. Months of productivity gains evaporated overnight.

The resolution wasn't complicated once they understood the landscape. Microsoft Azure OpenAI Service, which provides access to the same GPT models that power ChatGPT, does offer HIPAA-compliant configurations with BAAs for enterprise customers. Microsoft Copilot for Microsoft 365, when deployed through an enterprise agreement, also meets these requirements. The hospital's IT team set up an approved environment within eight weeks. The care coordinators got their workflow back. But the months of uncertainty, the legal review costs, and the compliance risk exposure were entirely avoidable with a basic data governance policy established from the start. The tool wasn't the problem. The absence of rules was.

Safe vs. Unsafe: Drafting a Follow-Up Letter Without Patient Data

Prompt

I need to draft a warm, professional follow-up letter for a patient who recently had a knee replacement surgery. They are 68 years old, the surgery went well, and we want to remind them about their 6-week check-up appointment and encourage them to keep up with their physical therapy exercises. Do not include any real names, dates, or identifying information. I will fill those in manually. Tone should be caring and clear, suitable for an older adult reader.

AI Response

Subject: Your Recovery Is Going Well. A Note from Our Care Team Dear [Patient Name], We wanted to reach out and let you know how pleased your care team is with how your recovery is progressing. Knee replacement surgery is a significant step, and the work you've put in during these early weeks truly makes a difference in your long-term outcomes. As a reminder, your 6-week follow-up appointment is coming up soon. This visit is an important milestone, it gives your surgeon a chance to review your progress, check on healing, and answer any questions you may have. Please make sure this appointment is on your calendar. If you need to reschedule, don't hesitate to call our office. In the meantime, please continue with your physical therapy exercises as prescribed. We know they can feel challenging, but consistency at this stage is one of the most important things you can do for your recovery. Your physical therapist is a great resource if any of the exercises feel uncomfortable or unclear. Don't hesitate to reach out if you have any questions or concerns before your appointment. We're here to support you every step of the way. Warm regards, [Care Coordinator Name] [Hospital/Clinic Name] [Contact Information]

The Placeholder Technique

Notice what that prompt did: it described the situation in general terms and explicitly instructed the AI not to include identifying details, with a note that real information would be added manually afterward. This is a practical technique that works across industries. Instead of pasting a real client name, write [Client Name]. Instead of a real salary figure, write [Compensation Amount]. You get 90% of the productivity benefit with zero data exposure risk. It takes 30 extra seconds to set up and removes the compliance problem entirely.

The Marketing Agency That Built the Right Habits Early

Not every story ends in a compliance scramble. A boutique marketing agency in London, about 35 people, working primarily with financial services clients, decided to get ahead of the problem in late 2022, before AI tools had become ubiquitous in their workflow. The agency's managing director had read early coverage of ChatGPT and immediately saw the opportunity, but also the exposure. Their financial services clients had strict NDAs. Campaign briefs contained market positioning data, budget allocations, and competitive intelligence that clients considered highly sensitive. If that material ended up in an AI training dataset, the agency's client relationships, and potentially its legal standing, would be at serious risk.

The MD spent one afternoon writing a one-page AI usage policy. It wasn't a legal document. It was a practical guide: which tools were approved (Claude Pro with privacy settings enabled, Canva AI for design work), what categories of information were off-limits (client names, budget figures, campaign performance data, anything under NDA), and what the placeholder technique looked like in practice. She shared it in a team meeting, answered questions for 20 minutes, and pinned it to the agency's Notion workspace. Within a month, every account manager was using Claude to draft client proposals faster, every designer was using Canva AI to generate concept visuals, and not a single piece of client-sensitive data had been exposed. The policy cost her an afternoon. The productivity gains were immediate and sustained.

Comparing the Approaches: What Separates Safe AI Use from Risky AI Use

Scenario	Tool Used	Data Involved	Governance in Place?	Outcome
Samsung engineers debug code	ChatGPT (consumer)	Proprietary source code, internal docs	No policy	Company-wide ban, months of disruption
Hospital care coordinators draft letters	ChatGPT Plus (consumer)	Patient visit notes (PHI)	No policy	HIPAA compliance risk, workflow suspended
Marketing agency drafts proposals	Claude Pro (privacy enabled)	Placeholder text only, no real client data	One-page policy, approved tools list	Productivity gains, zero exposure incidents
HR manager writes job descriptions	Microsoft Copilot (M365 enterprise)	Internal role requirements, no personal data	Enterprise agreement in place	Compliant, efficient, fully auditable
Consultant summarizes meeting notes	ChatGPT Plus (consumer, no enterprise plan)	Client meeting transcript with names and strategy	No policy	Potential NDA breach, client relationship risk

Five common AI usage scenarios compared across governance readiness and outcomes. The pattern is consistent: the tool matters less than whether clear rules exist before the work starts.

What a High School Principal Figured Out That Most Executives Haven't

Shift industries entirely. A high school principal in Auckland, New Zealand, noticed that her teachers had started using ChatGPT to write report card comments, draft parent communication letters, and generate differentiated lesson plans. She wasn't opposed to any of it, the time savings were real, and the quality of written communication had visibly improved. But she had a specific concern: some teachers were pasting in student names, learning difficulties, behavioral notes, and academic performance data to get more personalized outputs. Under New Zealand's Privacy Act 2020, student information is protected personal data. The school had no legal basis for routing it through a third-party AI system without parental consent and a proper data processing agreement.

Her solution was elegant and didn't require any technical expertise. She created a simple two-column reference sheet. The left column was labeled "Safe to include", things like grade level, subject area, general learning objectives, and tone preferences. The right column was labeled "Never include", student names, student ID numbers, special education designations, behavioral incident details, and anything from a student's official file. She laminated it and put it next to every classroom computer. Teachers kept their AI-assisted workflows. The school stayed compliant. Parents weren't alarmed. The principal had done in two hours what most corporate legal teams spend months trying to formalize: she translated a complex compliance requirement into a practical, visual decision tool that non-technical people could actually use.

Build Your Own Two-Column Reference Sheet

You can do exactly what that principal did, right now, for your own role. Take a blank document and create two columns: "Safe to include in AI prompts" and "Never include in AI prompts." Spend 10 minutes filling it in based on your specific job. Think about client names, financial figures, employee performance data, medical information, legal correspondence, and anything your organization considers confidential. Share it with your team. Pin it somewhere visible. This single artifact, which takes no technical skill to create, does more to protect your organization than any amount of policy language that nobody reads.

Why the Default Settings on Consumer Tools Don't Protect You

One of the most important things to understand about popular AI tools is that their default configurations are designed for individual users, not for organizational data protection. ChatGPT's free tier, by default, uses your conversations to improve its models unless you actively opt out in settings. Google Gemini, when accessed through a personal Gmail account rather than a Google Workspace enterprise account, similarly retains data according to Google's consumer privacy policy rather than enterprise data processing terms. This distinction matters enormously. Consumer policies are written for individuals sharing personal information. Enterprise agreements are written for organizations handling client data, employee records, and proprietary business information, and they come with contractual commitments about data retention, access, and security.

The good news is that enterprise-grade versions of the most popular tools do exist, and many organizations already have access to them without realizing it. If your company uses Microsoft 365. Outlook, Word, Excel, Teams, and has an enterprise license, Microsoft Copilot is likely available to you under data protection terms that are far more robust than the consumer version of any competing tool. If your organization uses Google Workspace for Business or Enterprise, Google Gemini within that environment operates under Google's enterprise data processing addendum, which is a very different legal arrangement than a personal Gmail account. The tool on the screen may look identical. The data handling underneath it is completely different.

This is the single most underappreciated point in workplace AI adoption. Employees using a personal ChatGPT Plus account, even a paid one, for work tasks are operating outside their organization's data governance framework, regardless of how careful they are with what they type. The account isn't connected to the organization's security policies, audit logs, or legal agreements. If something goes wrong, the organization has no visibility and no recourse. By contrast, an employee using Microsoft Copilot through a properly configured Microsoft 365 enterprise account is working inside a governed environment where the organization retains control over data, can audit usage, and has contractual protections with Microsoft. Same AI capability. Completely different risk profile. The choice of account matters as much as the choice of tool.

Putting This Into Practice: Your First Governance Action

Create Your Team's AI Data Classification Guide

Goal: Produce a practical, one-page reference document that your team can use immediately to determine what information is safe to include in AI tool prompts, without needing to consult legal or IT every time.

1. Open a blank document in Microsoft Word, Google Docs, or Notion, whichever your team uses daily. Title it 'AI Prompt Data Guidelines, [Your Team Name]' and add today's date. 2. Create a two-column table with the headers 'Safe to Include in AI Prompts' and 'Never Include in AI Prompts.' Make the table large enough to hold at least 10 items in each column. 3. In the 'Safe to Include' column, list at least 8 specific types of information relevant to your role, for example: general project descriptions, publicly available competitor information, anonymized customer personas, internal process steps, draft copy for review, meeting agenda topics. 4. In the 'Never Include' column, list at least 8 specific types of sensitive information, for example: client names and contract values, employee salary data, patient or student records, legal correspondence, proprietary financial forecasts, personal identification numbers. 5. Add a third row at the bottom of each column labeled 'When in doubt...', in the Safe column write 'Use a placeholder like [Client Name] or [Budget Amount],' and in the Never column write 'Contact your manager or IT before proceeding.' 6. Below the table, add a short section titled 'Approved Tools for This Team' and list the AI tools your organization has formally approved, along with any account-type requirements (e.g., 'Microsoft Copilot via M365 enterprise account only, not personal accounts'). 7. Share the document with your immediate team and ask each person to add one item to each column based on their specific role, this takes 5 minutes and makes the guide more comprehensive and more trusted. 8. Pin or post the document somewhere your team will see it regularly, a shared Notion page, a Teams channel, a printed sheet near workstations, or a pinned message in your team's Slack channel. 9. Set a calendar reminder for 60 days from now to review and update the guide, noting any new tools your team has started using or any new types of sensitive data that have come up in your work.

Key Lessons from Part 1

Data governance for AI isn't a new problem, it's the same question of who can see sensitive information, applied to new tools that most organizations haven't yet built rules around.
The Samsung, hospital, and consultant examples share a common root cause: capable employees using useful tools in a policy vacuum, not malicious intent.
The type of account matters as much as the tool itself, consumer accounts (personal ChatGPT Plus, personal Gmail/Gemini) operate under very different data terms than enterprise accounts (Microsoft Copilot via M365, Google Gemini via Workspace Business/Enterprise).
The placeholder technique, replacing real sensitive details with labeled brackets like [Client Name] or [Budget Figure], is the single most practical habit for protecting sensitive data while preserving AI productivity gains.
Non-technical professionals can create effective data governance tools. The Auckland principal's laminated two-column sheet is more useful in practice than most corporate AI policies that run to 40 pages.
Industry-specific regulations (HIPAA for healthcare, GDPR for EU personal data, FERPA for US student records) apply to AI tools just as they apply to email and cloud storage, the technology being new does not create a compliance exemption.
The first governance action for any team is a simple, visible reference document that answers one question in under 10 seconds: 'Can I put this in an AI prompt?'

When Good Intentions Create Data Breaches

In early 2023, a mid-sized consulting firm in Chicago, let's call them Meridian Advisory, was under pressure to deliver faster client reports. Their senior analyzts started using ChatGPT to summarize meeting notes, draft findings, and generate slide copy. Nobody told them not to. Nobody told them how, either. Within six weeks, three analyzts had pasted full client financial models into the chat window, one had submitted a competitor analyzis containing a client's unreleased acquisition plans, and another had uploaded an HR restructuring memo that named specific employees. No malicious intent. No rogue actors. Just busy professionals using a powerful tool without a framework to guide them.

Meridian's managing partner only discovered the problem when a client asked, during a contract renewal meeting, whether their data was being used to train AI systems. The partner didn't know the answer. That silence cost them the contract, a $400,000 annual retainer. The client's legal team had been monitoring AI governance disclosures after a competitor's data appeared in a ChatGPT response (a well-documented phenomenon from early 2023). Meridian hadn't broken any laws. But they had broken trust, and in professional services, trust is the entire product.

What Meridian experienced is now textbook in AI governance circles: the gap between tool availability and tool governance. The analyzts weren't negligent, they were efficient. The problem was structural. No one had defined what data could go where, which tools were approved for which tasks, or what the firm's obligations were to clients whose data entered a third-party AI system. The principle that emerges from their story is blunt: access to an AI tool is not the same as permission to use it with any data you have.

The Default Data Assumption

Most AI tools, including the free and standard paid tiers of ChatGPT, Google Gemini, and others, have historically used conversation data for model improvement unless users explicitly opt out or upgrade to enterprise plans. OpenAI introduced a chat history opt-out in April 2023 and enterprise agreements with stronger protections. Always check the current data handling terms for any tool your team uses, because they change, and the defaults matter enormously.

Data Classification: The System That Makes Rules Manageable

Meridian's real failure wasn't using AI, it was having no data classification system. Data classification is simply a way of labeling information by how sensitive it is and what handling rules apply. Think of it like the color-coded folders some offices use: red means restricted, yellow means internal only, green means public. When your team knows which category their data falls into before they open an AI tool, the decision about what to paste or upload becomes almost automatic. The folder color tells you what to do.

Most professional organizations operate with roughly four levels of data sensitivity, even if they've never formally named them. Public data is anything you'd post on your website: press releases, job listings, general product descriptions. Internal data is everyday business information, meeting agendas, project timelines, internal process documents, that isn't secret but isn't for outsiders. Confidential data includes client information, financial details, contracts, employee records, and strategic plans. Restricted data is the most sensitive tier: personally identifiable information (PII), health records, legal documents under privilege, regulated financial data, and anything with explicit contractual confidentiality obligations.

Once you have these categories, you can build simple rules that anyone on your team can follow without a law degree. Public data? Use any AI tool freely. Internal data? Use approved tools with standard caution. Confidential data? Use only enterprise-tier tools with data processing agreements, or don't use AI at all for that specific task. Restricted data? Treat it like cash, manual processes only, or purpose-built secure environments. This isn't bureaucracy. It's the same logic as locking your filing cabinet. The AI era just made the rules more urgent.

Creating a Data Classification Policy for a Small Team

Prompt

I manage a 12-person marketing team at a B2B software company. We want to start using AI tools like ChatGPT and Claude for content creation and campaign analyzis. I need a simple, one-page data classification guide that tells my team what kinds of information they can and cannot put into AI tools. We work with customer data, campaign performance metrics, competitor research, and internal strategy documents. Write the guide in plain language, no legal jargon. Include four data levels, examples of each, and a clear rule for each level about AI tool use.

AI Response

**Marketing Team AI Data Guide. What Goes In, What Stays Out** **Level 1. Public (Green Light)** Examples: Blog post drafts, product descriptions, published case studies, social media copy, general industry research. AI Rule: Use any approved tool freely. This is what AI tools are built for. **Level 2. Internal (Yellow Light)** Examples: Campaign calendars, internal meeting notes without client names, team process documents, general performance benchmarks. AI Rule: Use approved tools (ChatGPT Plus, Claude Pro, Copilot). Remove names and specific company references before pasting. Don't share outputs outside the team. **Level 3. Confidential (Orange Light)** Examples: Customer names and contact lists, campaign ROI tied to specific accounts, competitor intelligence reports, budget figures, contract terms. AI Rule: Use only enterprise tools with a signed data processing agreement (ChatGPT Enterprise, Microsoft Copilot for M365, Claude for Enterprise). Never use free or personal-tier accounts. When in doubt, ask your manager. **Level 4. Restricted (Red Light)** Examples: Customer PII (emails, phone numbers, addresses), payment data, any data covered by GDPR or CCPA, legal agreements. AI Rule: Do not enter this data into any AI tool. Full stop. Use manual processes or consult IT and legal before proceeding. **Quick Test:** Before you paste anything into an AI tool, ask: "Would I be comfortable if our biggest customer saw exactly what I'm submitting?" If the answer is no, move up a level.

Healthcare Gets It Right. And Shows Everyone Else the Way

The healthcare sector has operated under strict data governance rules since HIPAA became law in 1996. So when AI tools arrived, hospital administrators and clinical teams already had mental muscles for asking: "Is this data protected? Who can see it? Where does it go?" That habit gave them a structural advantage. Atrium Health, a large hospital network in the Carolinas, began piloting AI writing tools for administrative staff in 2022, drafting patient communication letters, summarizing policy documents, generating training materials. Their governance framework was already in place: protected health information (PHI) never left approved, HIPAA-compliant systems. AI tools were evaluated against that standard before anyone touched them.

The lesson for non-healthcare professionals isn't "follow HIPAA", it's "build the habit of asking the governance question before the tool question." Atrium's staff didn't ask "Can AI help me write this?" first. They asked "What kind of data does this task involve?" first. That sequencing change is everything. A sales manager drafting a proposal doesn't need to know HIPAA. But they do need to ask whether the client details they're about to paste into Gemini are covered by a non-disclosure agreement, and whether Gemini's terms allow that. Same mental habit, different context.

Comparing AI Tools by Data Governance Strength

AI Tool	Free/Consumer Tier. Data Used for Training?	Paid Tier Data Protection	Enterprise Option Available?	Best For
ChatGPT (OpenAI)	Yes by default; opt out in settings	ChatGPT Plus: opt-out available; data not used if opted out	Yes. ChatGPT Enterprise; zero data retention, SOC 2 compliant	Content creation, analyzis, drafting
Claude (Anthropic)	Free tier: conversations may be reviewed	Claude Pro: stronger protections; review their current terms	Yes. Claude for Enterprise; no training on your data	Long documents, nuanced writing, policy work
Microsoft Copilot (M365)	Consumer: standard Microsoft terms	Copilot for M365: data stays within your Microsoft tenant	Yes, built into M365 E3/E5 enterprise plans	Office workflows, Teams, Outlook, Excel
Google Gemini	Consumer: data used per Google's terms	Gemini for Workspace: data not used to train models	Yes. Google Workspace enterprise tiers	Gmail, Docs, Sheets integration
Notion AI	Included in paid Notion plans	Data handled per Notion's privacy terms; no AI training on your data per their policy	Yes. Notion for Enterprise with BAA available	Knowledge management, internal docs

Data governance protections vary significantly between consumer and enterprise tiers. Always verify current terms directly with the vendor, policies update frequently.

A Recruiting Manager's Lesson in Unintended Exposure

Consider a talent acquisition manager at a regional logistics company. She was using Claude Pro to help screen resumes and draft interview feedback summaries. Smart use of AI, time-consuming administrative work, clear efficiency gains. The problem emerged when she started pasting full candidate profiles, names, addresses, employment history, salary expectations, directly into her prompts. She wasn't thinking about data classification. She was thinking about getting through 40 applications before Friday. Her intentions were entirely professional. Her data handling was not.

Candidate data is among the most sensitive information a company holds. In many jurisdictions, it's covered by privacy regulations: GDPR in Europe, CCPA in California, and a growing patchwork of state-level laws in the US. When she pasted those profiles into a standard paid-tier AI tool, she potentially created a compliance exposure her company's legal team didn't know existed. The fix was straightforward once identified: use anonymized summaries ("Candidate A: 8 years logistics experience, seeking $85K") instead of real names and details. Same AI benefit, zero data risk. The principle, anonymize before you analyze, became a standing rule for her entire HR team.

The Anonymization Shortcut

Before pasting any document containing real names, client identifiers, or personal details into an AI tool, do a quick find-and-replace. Swap real names for "Client A," "Candidate B," or "Company X." Replace specific financial figures with approximate ranges if the exact number isn't needed for the task. This takes 90 seconds and eliminates the most common source of accidental data exposure. You get the same quality output, the AI doesn't need the real name to help you write better feedback.

Building a Practical Governance Habit for Your Team

Governance frameworks fail when they live in PDFs that nobody reads. The organizations that actually protect themselves are the ones that build governance into the moment of use, the second before someone presses send or paste. This means making the right behavior the easy behavior. A one-page cheat sheet on every desk. A Slack pinned message with the four data levels. A 10-minute onboarding conversation the first time someone joins a team that uses AI tools. These aren't bureaucratic hurdles, they're the professional equivalent of washing your hands before surgery. Basic, fast, and catastrophically important.

The most effective teams also establish a simple escalation path. When someone isn't sure whether data is safe to use with an AI tool, they need to know exactly who to ask and how fast they'll get an answer. If the answer takes three days, people will skip the question and take their chances. A designated "AI tool owner", often a tech-savvy team member, an operations manager, or an IT liaison, who can answer governance questions within a few hours makes the whole system work. This person doesn't need to be a data scientist. They need to know the company's tool approvals, the data classification tiers, and who in legal or IT to escalate genuinely complex questions to.

There's also the matter of outputs, not just inputs. When an AI tool generates a report, a client email, or a hiring recommendation, that output may contain inferred information about real people or real business situations. Sharing that output externally, forwarding it to a client, publishing it, including it in a legal filing, carries its own governance obligations. Did the AI hallucinate a fact that's now in a client proposal? Did it reproduce a sentence that sounds like it came from a competitor's proprietary document? Reviewing AI outputs before they leave your organization isn't just quality control. It's governance at the output layer, and it's just as important as governance at the input layer.

Build Your Team's AI Data Classification Guide

Goal: Create a practical, one-page data governance reference your team can use immediately when deciding what information is safe to put into AI tools.

1. Open a blank document in Word, Google Docs, or Notion, wherever your team already works and will actually see it. 2. Write a title at the top: '[Your Team Name] AI Tool Data Guide' and add today's date so people know it's current. 3. Create four clearly labeled sections: Public, Internal, Confidential, and Restricted. Use color coding if your tool supports it (green, yellow, orange, red). 4. Under each section, write 3-5 examples of real data your specific team handles, use your actual job titles, document types, and client situations, not generic placeholders. 5. Under each section, write one clear rule: which AI tools are approved for that data level, and what precautions apply (e.g., 'anonymize names before pasting'). 6. Add a 'When In Doubt' box at the bottom that names one specific person on your team or in your organization who can answer AI data questions, include their name and preferred contact method. 7. Share a draft with one colleague and ask them to find one scenario from their actual workday that the guide doesn't clearly cover. Use their answer to add one more example. 8. Save the final version somewhere your whole team can find it in under 30 seconds, a pinned Slack message, a shared drive folder, or a Notion page linked from your team homepage. 9. Set a calendar reminder for 90 days from now to review the guide and update it based on any new tools your team has adopted or new data types you're handling.

Key Principles from This Section

Access to an AI tool is not the same as permission to use any data you have with that tool, those are two separate decisions.
Data classification gives your team a decision-making shortcut: know the category, know the rule, skip the guesswork.
The four practical tiers. Public, Internal, Confidential, Restricted, apply to virtually every professional context, regardless of industry.
Consumer and enterprise tiers of the same AI tool have meaningfully different data protections. The tier your team uses matters legally and contractually.
Anonymizing data before it enters an AI tool eliminates the most common source of accidental exposure without reducing the quality of AI assistance.
Governance at the output layer is as important as governance at the input layer, review what the AI produces before it leaves your organization.
The best governance systems are frictionless: one-page guides, named point people, fast escalation paths, and built-in habits rather than policies buried in handbooks.

What to Carry Forward

Every professional using AI tools today is making data governance decisions, whether they know it or not. The Meridian Advisory story, the recruiting manager's candidate profiles, the hospital network that already had its framework in place, these aren't cautionary tales about technology. They're stories about professional judgment in a new environment. The tools are fast and genuinely useful. The governance habits that protect your clients, your colleagues, and your organization don't have to be slow or complicated. They just have to exist, clearly, accessibly, and in the hands of everyone who opens a chat window.

Building a Data Governance Habit That Actually Sticks

In 2023, Samsung's semiconductor division made headlines for the wrong reasons. Within weeks of granting employees access to ChatGPT, engineers had pasted proprietary chip design code and internal meeting notes directly into the tool. Three separate incidents. The data left Samsung's control the moment it hit OpenAI's servers. Samsung's response was swift, they banned ChatGPT internally and began building a private AI system. But the damage was done, and the lesson was stark: when you give professionals a powerful tool without clear rules, they use it in the most efficient way they know. Nobody was trying to cause harm. They were just trying to do their jobs faster.

What Samsung experienced wasn't a technology failure. It was a governance failure. The company had no policy telling employees what could and couldn't be shared with external AI tools. There was no training, no classification system for sensitive data, no moment where someone asked: 'Before you paste that, does this data belong in a third-party system?' The engineers acted rationally inside a vacuum of guidance. This is the core tension in AI data governance, the tools are frictionless by design, and professional habits don't automatically account for where data goes once you hit send.

The principle Samsung's story gives us is uncomfortable but useful: governance gaps get filled by individual judgment, and individual judgment under deadline pressure almost always prioritizes speed. That's not a character flaw, it's human nature. The fix isn't surveillance or bans. It's building simple, memorable rules that professionals can apply in the moment, without needing a lawyer or an IT ticket.

What Data Governance Actually Means for AI Users

Data governance, in plain terms, is deciding who can use what data, in which tools, for which purposes, and making sure everyone knows the rules. For AI users specifically, it means knowing before you type: Is this information safe to share with an external AI service? Could this data identify a real person, expose a trade secret, or violate a legal agreement? You don't need a policy committee to answer those questions. You need a habit.

The Nurse, the Spreadsheet, and the Patient Who Never Consented

A nursing home administrator in the UK, let's call her Claire, was drowning in staffing reports. She discovered that pasting shift data into ChatGPT could generate a readable weekly summary in seconds, saving her nearly two hours every Friday. For three months, she did exactly that. The spreadsheets included staff names, patient-to-staff ratios by ward, and timestamps that could be cross-referenced to identify which patients received reduced care on which days. Claire wasn't sharing patient names directly. But the combination of data was enough to be considered personal health information under UK GDPR. She hadn't done anything malicious. She had done something efficient, inside a governance vacuum.

Claire's story illustrates what privacy experts call the aggregation problem, individually harmless data points that become sensitive when combined. A ward name isn't sensitive. A shift time isn't sensitive. A care ratio isn't sensitive. Together, they paint a picture that could identify patients and expose the facility to regulatory risk. The UK Information Commissioner's Office has issued guidance specifically warning healthcare administrators about this pattern. Claire's facility eventually implemented a simple rule: no spreadsheet containing operational data tied to care delivery goes into a public AI tool. They adopted Microsoft Copilot inside their existing Microsoft 365 environment instead, where data stayed within their tenancy.

The lesson Claire's case teaches isn't 'don't use AI.' It's 'know your data before you share it.' Healthcare is an extreme example because the regulations are explicit, but the aggregation problem exists in every industry. A sales manager sharing pipeline data, a teacher uploading student performance spreadsheets, a consultant pasting client financials, all of them can create exposure through combination, even when no single data point looks dangerous on its own.

HR Manager. Drafting a Team AI Usage Policy

Prompt

I manage an HR team of 8 people at a mid-sized financial services company. We handle employee data including salaries, performance reviews, disciplinary records, and recruitment information. I want to create a simple one-page policy that tells my team what they can and cannot share with AI tools like ChatGPT or Claude. Make it practical, not legal jargon. Use plain language. Include a quick-reference checklist they can keep at their desk.

AI Response

**HR Team AI Usage Policy. Quick Reference** **What you can share with AI tools (ChatGPT, Claude, etc.):** - Job description drafts (before posting, no internal salary bands) - Generic email templates and communication frameworks - Training content outlines and onboarding checklists - Policy document drafts that contain no employee-specific information - Interview question banks and competency frameworks **What you must never share with external AI tools:** - Employee names combined with salary, performance, or disciplinary data - Recruitment data that identifies specific candidates - Any document containing national insurance numbers, dates of birth, or home addresses - Grievance or investigation records - Anything marked CONFIDENTIAL or RESTRICTED **The 10-Second Check (ask before you paste):** 1. Does this contain a real person's name + any sensitive detail? → Don't share. 2. Could someone identify an employee from this data, even without a name? → Don't share. 3. Would I be comfortable if this appeared in a data breach report? → If no, don't share. **Approved tools for sensitive HR work:** Microsoft Copilot within our M365 environment only. **When in doubt:** Remove all names and identifiers, use placeholder text (Employee A, Department X), then share the anonymized version. Questions? Contact the Data Protection Lead before using AI on any unfamiliar data type.

How a Marketing Agency Turned a Crisis Into a Framework

A boutique marketing agency in Toronto discovered that three account managers had been using Claude and ChatGPT to draft client campaign strategies, pasting in client briefs, audience research, and competitive analyzis that were explicitly marked confidential under their client contracts. The agency found out when a client asked why elements of their 'proprietary strategy' had appeared in a competitor's campaign six months later. Direct causation was impossible to prove, but the client terminated the contract. The agency lost $340,000 in annual revenue and spent months rebuilding trust across their remaining client base.

What the agency built afterward became their competitive differentiator. They created a three-tier data classification system. Green (freely shareable), Amber (shareable with anonymization), Red (internal tools only), and trained every account manager on it in a single 90-minute workshop. They switched client-facing AI work to Claude Pro's Projects feature, which doesn't use conversations for training by default, and documented this in client contracts as a trust signal. Within a year, they were winning pitches partly because of their 'AI-safe client data' policy. The crisis became the credential.

Data Type	Examples	Safe for Public AI Tools?	Recommended Approach
Green. Public / Generic	Industry trends, job description templates, general frameworks	Yes	Use freely in ChatGPT, Claude, Gemini
Amber. Internal / Operational	Team processes, anonymized reports, draft policies	With caution	Remove names/identifiers first; use enterprise tools where possible
Red. Confidential / Personal	Client contracts, employee records, financial data, patient info	No	Use only in approved internal tools (Copilot M365, private deployments)

Three-Tier Data Classification for AI Users, a practical framework any team can implement

The Teacher Who anonymized Everything and Slept Better

Maria teaches Year 9 English at a secondary school in Melbourne. She started using ChatGPT to generate differentiated feedback comments for student essays, a task that previously consumed four hours every Sunday. Her first instinct was to paste in the essays directly, with student names intact. A colleague mentioned FERPA-equivalent concerns under Australian privacy law, and Maria paused. She developed a simple substitution habit: before pasting any student work, she replaces the student's name with a code (S1, S2, S3) in a private notepad, generates the AI feedback, then matches it back to the real student in her own records. The whole substitution process takes 90 seconds per class set.

Maria's approach is low-tech, takes minimal training, and requires no new software. It's the kind of practical governance habit that scales because it costs almost nothing to implement. Her school has since adopted her method as a recommended practice for all teachers using AI tools for student feedback. The principle she embodies: you don't need a data governance committee to make safe choices. You need one clear rule and the habit of applying it before you type.

The 'Newspaper Test' for AI Data Sharing

Before pasting anything into an AI tool, ask: 'If this prompt appeared on the front page of a newspaper, would my company, client, or colleagues be embarrassed or harmed?' This isn't about paranoia, most of what you share is fine. It's a half-second gut check that catches the cases where habit and deadline pressure would otherwise override judgment. If the answer is even 'maybe,' anonymize first.

Making Governance Real in Your Daily Workflow

Data governance sounds like something that lives in a policy document nobody reads. The professionals who do it well have made it invisible, a 10-second mental check that happens before every AI interaction, like looking both ways before crossing a street. The three-tier classification system (Green, Amber, Red) is a starting point, not a bureaucratic requirement. You can apply it right now, today, without any approval from IT or legal. Look at what you typically paste into AI tools and mentally sort it. Most of it is probably Green. A small portion is Amber. A very small portion is Red. That awareness alone changes behavior.

Tool choice matters too. ChatGPT Free and the standard Claude.ai interface may use your conversations to improve their models unless you opt out in settings, this is documented in both companies' privacy policies and is easy to disable. ChatGPT Plus users can turn off training in Settings → Data Controls. Claude Pro doesn't use conversations for training by default. Microsoft Copilot within a work Microsoft 365 account keeps data within your organization's tenancy, it never leaves to train a public model. Google Workspace's Gemini operates under Google's enterprise data protection terms when accessed through a paid Workspace account. Knowing which tool you're in, and what its data terms are, is a one-time 15-minute investment that changes every interaction afterward.

The final shift is cultural, not technical. Samsung's ban didn't solve the underlying problem, it just pushed employees toward workarounds. The agencies, healthcare administrators, and teachers who handle AI data well didn't get there through restriction. They got there through clarity. Clear rules, applied consistently, explained in plain language, reviewed when tools change. That's data governance at the professional level. No IT degree required.

Build Your Personal AI Data Governance Checklist

Goal: Create a practical, one-page reference document that classifies your typical work data and tells you, clearly, what's safe to share with AI tools and what isn't.

1. Open a blank document in Word, Google Docs, or Notion. Title it 'My AI Data Sharing Rules, [Your Name/Team].' 2. List 8-10 types of data you regularly work with (e.g., client emails, budget spreadsheets, employee feedback, meeting notes, customer names, product specs). 3. Next to each data type, write one of three labels: GREEN (safe to share as-is), AMBER (safe after anonymizing), or RED (internal tools only, never public AI). 4. For each AMBER item, write one sentence describing how you would anonymize it before sharing (e.g., 'Replace client name with CLIENT A, remove contract values'). 5. Open your main AI tool (ChatGPT, Claude, or Gemini) and go to its privacy/data settings. Find and screenshot the setting that controls whether your conversations are used for model training. Note whether it is currently on or off. 6. Write one sentence at the top of your document stating which AI tools are approved for your work context and which require caution (check with your manager if unsure). 7. Share a draft of your checklist with one colleague and ask them to add any data types you missed from your shared workflow. 8. Save the final version somewhere you'll see it, pin it in your browser bookmarks, add it to your Notion workspace, or print it and keep it at your desk. 9. Set a calendar reminder for 90 days from now to review and update the checklist as your AI tool usage evolves.

Governance gaps get filled by individual judgment under pressure, clear rules prevent the Samsung problem before it starts.
The aggregation problem means individually harmless data points can become sensitive when combined, always consider what your data reveals as a set, not just as individual fields.
A three-tier classification system (Green / Amber / Red) gives any professional a decision framework they can apply in seconds, without legal training.
anonymization is the practical middle ground, removing or replacing names and identifiers lets you use AI freely on Amber-category data without creating real exposure.
Tool choice is a governance decision. ChatGPT Free, Claude Pro, Microsoft Copilot, and Google Workspace Gemini all have different data handling terms that materially affect what's safe to share where.
Opt-out settings exist and take 60 seconds to configure, disabling conversation training in ChatGPT or confirming Claude Pro's default policy is a one-time action with lasting impact.
Governance works best when it's cultural, not restrictive, teams that explain the 'why' behind data rules get better compliance than teams that issue bans.

Key Takeaways

Data governance for AI users is not an IT function, it's a professional habit that every person using AI tools needs to develop independently.
Before sharing anything with an AI tool, apply the 10-second check: Does this contain real people's data? Could it expose confidential information? Would it pass the newspaper test?
Classify your work data into three tiers. Green, Amber, Red, and treat each tier differently based on the sensitivity of what it reveals.
anonymize Amber-category data before using public AI tools. Replace names with codes, remove identifying numbers, and use placeholder text.
Know your tool's data terms. Enterprise versions of Copilot, Gemini Workspace, and Claude Pro offer stronger protections than free consumer tiers.
The professionals who handle AI data governance best aren't the most technically sophisticated, they're the ones with the clearest, simplest rules applied most consistently.

Featured Reading

↗2025 Guide to Secure, Affordable AI Data Governance - Kiteworks

This lesson requires Pro

Upgrade your plan to unlock this lesson and all other Pro content on the platform.

Upgrade to Pro

You're currently on the Free plan.

Practice this in a lab

Spot the Safer Prompt: Protecting Patient Context in Healthcare AI

intermediate · 10 min

Spot the Red Flags in This AI-Generated Legal Summary

intermediate · 12 min