Stay Safe: Risks, Rules, and Responsibility
AI Risks and Governance in Finance
Part 1: When AI Gets It Wrong, and Who's Responsible
Historical Record
DBS Bank
In early 2023, DBS Bank in Singapore rolled out AI-assisted tools across several internal finance functions, including expense categorization and preliminary financial report drafting.
The case demonstrates how AI governance gaps can create invisible drift between AI outputs and human understanding, even when no catastrophic failure occurs.
The DBS case is instructive precisely because nothing catastrophic happened. No headlines, no regulatory fine, no job losses. Just a slow, invisible drift between what the AI was doing and what the finance team assumed it was doing. That gap, between AI output and human understanding, is where most AI risk in finance actually lives. It's not dramatic system failures. It's confident-sounding wrong answers that get copy-pasted into board decks. It's automated summaries that miss a one-time charge that changes the story entirely. It's an AI tool flagging a vendor as low-risk because its training data predates a major lawsuit.
DBS eventually resolved the issue by implementing a monthly spot-check protocol, a human reviewer sampling 10% of AI-categorized transactions each month to catch drift early. Simple. Manual. Effective. The lesson wasn't that AI was too risky to use. It was that AI without a governance structure around it creates invisible exposure. Finance teams that understand this distinction, between using AI and governing AI, will be the ones that get the benefits without the surprises. That's what this lesson is about.
The Principle Hidden in the DBS Story
Every AI tool you use in finance is making decisions based on patterns it learned from past data. It doesn't know your current business context. It doesn't know that your company just restructured, that a major client churned, or that your industry's cost structure changed after a supply chain disruption. The tool confidently applies old patterns to new situations, and it does so without flagging uncertainty unless you've specifically set it up to do so. This is the foundational risk in AI-assisted finance work: the tool sounds authoritative even when it's wrong. Unlike a junior analyzt who might say 'I'm not sure about this one,' AI tools typically produce clean, formatted, confident-looking output regardless of how reliable the underlying reasoning is.
What 'AI Governance' Actually Means for Finance Professionals
Hallucinations in the Boardroom: A Story From Professional Services
A mid-sized consulting firm in Chicago, around 200 employees, serving mid-market private equity clients, started using ChatGPT Plus in late 2023 to accelerate the production of financial due diligence summaries. analyzts would feed the tool a company's financial statements and ask it to produce a narrative summary of revenue trends, margin performance, and working capital dynamics. The time savings were real: what used to take four hours took forty-five minutes. Then, during a live client presentation, a partner noticed a figure in the summary didn't match the source document. The AI had calculated a compound annual growth rate incorrectly, not by much, 14.3% instead of 12.1%, but enough to change the narrative about the target company's trajectory.
The error came from what AI researchers call 'hallucination', the model generating plausible-sounding but incorrect information, particularly with calculations. ChatGPT and Claude are language models, not spreadsheets. They're extraordinarily good at writing, summarizing, and structuring ideas. They are not reliably accurate at arithmetic, especially compound calculations or percentage changes across multiple periods. The consulting firm's analyzts had treated the AI like a calculator. It isn't one. The fix was straightforward: all numerical claims in AI-generated summaries now get cross-checked against source documents before any output leaves the team. The AI still saves hours. The humans still own the numbers.
This distinction. AI for language tasks, humans for number verification, is one of the most practical governance rules you can implement this week. It doesn't require any technical setup. It's a team agreement: AI drafts the narrative, humans verify every figure cited. The Chicago firm formalized this as a two-minute checklist at the end of every AI-assisted document. Six items, each tied to a specific type of error they'd seen. That checklist is now part of their standard operating procedure for all client deliverables. That's governance. Simple, practical, and built from a real mistake.
Prompt
I'm going to paste in three years of quarterly revenue data for a manufacturing company. Please write a 200-word narrative summary of the revenue trend for a due diligence report. After the summary, add a separate section called 'Figures to Verify' that lists every specific number you've cited, so I can cross-check each one against the source document before this goes to the client. Here is the data: Q1 2022: $4.2M, Q2 2022: $4.8M, Q3 2022: $5.1M, Q4 2022: $4.9M Q1 2023: $5.3M, Q2 2023: $5.7M, Q3 2023: $6.0M, Q4 2023: $5.8M Q1 2024: $6.4M, Q2 2024: $6.9M, Q3 2024: $7.2M, Q4 2024: $7.0M
AI Response
**Revenue Trend Summary** The company has delivered consistent top-line growth over the three-year period from 2022 through 2024, with annual revenue rising from approximately $19.0 million in 2022 to $23.7 million in 2023, and reaching approximately $27.5 million in 2024. This represents cumulative growth of roughly 45% over the full period. Quarterly performance shows a recurring pattern: strong acceleration through Q1–Q3 each year, followed by a modest Q4 softening. This seasonal dynamic is consistent across all three years, suggesting it reflects a structural characteristic of the business, potentially tied to customer procurement cycles or project-based revenue recognition, rather than any deteriorating trend. The Q3-to-Q4 decline averages approximately 3–4% annually, which is modest and has not prevented strong year-over-year growth. The trajectory from Q1 2022 ($4.2M) to Q4 2024 ($7.0M) reflects a business that has nearly doubled its quarterly revenue run rate in three years. --- **Figures to Verify Against Source Document** - 2022 full-year total: ~$19.0M (sum of four quarters) - 2023 full-year total: ~$23.7M (sum of four quarters) - 2024 full-year total: ~$27.5M (sum of four quarters) - Cumulative 3-year growth: ~45% - Q3-to-Q4 average decline: ~3–4% - Q1 2022 starting point: $4.2M - Q4 2024 ending point: $7.0M *Please verify all totals and percentages against the original financial statements before client distribution.*
The Same Risk, Different Industry: Healthcare Finance
A regional hospital network in the American Midwest began using Microsoft Copilot, integrated into their existing Microsoft 365 environment, to help the finance team draft variance analyzis reports. These are monthly documents explaining why actual spending differed from budget, shared with department heads across clinical and administrative functions. Copilot was pulling from data in Excel and drafting narrative explanations automatically. For routine variances, overtime in nursing, supply cost increases, the drafts were solid. But in one report, Copilot described a significant positive variance in pharmaceutical spending as 'improved procurement efficiency.' The actual reason was a delayed drug shipment that pushed costs into the next month. Clinically significant. Financially misleading if left uncorrected.
The hospital's finance director, who caught the error before distribution, described the problem precisely: 'Copilot explained the numbers perfectly. It just didn't know what the numbers meant.' That's the core challenge with AI in finance contexts where numbers carry operational meaning. A variance isn't just a math problem, it tells a story about what happened in the business. AI tools don't have access to the conversations, the emails, the vendor calls, or the clinical decisions that explain why a number moved. Humans do. The hospital now requires that all AI-drafted variance narratives be reviewed by someone who attended the relevant operational meetings before the report is finalized. The AI still writes the first draft. The human adds the context.
Comparing AI Risk Across Common Finance Tasks
| Finance Task | AI Tool Commonly Used | Where AI Adds Value | Primary Risk | Governance Rule |
|---|---|---|---|---|
| Expense categorization | Copilot, Xero AI, QuickBooks AI | Speed, consistency across large volumes | Outdated training data misclassifies new expense types | Monthly spot-check: human reviews 10% of categorizations |
| Financial narrative writing | ChatGPT Plus, Claude Pro | First drafts, structure, plain-English explanations | Hallucinated figures, incorrect calculations | Verify every cited number against source before distribution |
| Variance analyzis reports | Microsoft Copilot | Drafting routine explanations, formatting | Missing operational context behind the numbers | Review by someone with meeting/operational knowledge |
| Invoice processing | Copilot, Bill.com AI | Routing, matching, flagging duplicates | False positives or missed duplicates in edge cases | Human approval required above defined dollar threshold |
| Budget forecasting | Excel Copilot, Anaplan AI | Scenario modeling, trend extrapolation | Overconfident projections based on historical patterns only | Label all AI forecasts as 'draft, assumptions unverified' |
| Audit preparation | ChatGPT Plus, Claude Pro | Organizing documentation, drafting responses | Omitting material items AI wasn't shown | Auditor or controller reviews completeness before submission |
| Contract financial review | Claude Pro, Copilot | Summarizing payment terms, flagging key clauses | Misreading complex or ambiguous language | Legal or finance lead reviews all flagged clauses directly |
A CFO's Perspective: Governance as Professional Responsibility
Sarah Chen, CFO of a $50M e-commerce company, introduced AI tools to her four-person finance team in mid-2023. She was pragmatic about it: 'We're not a bank. We don't have a compliance department. If something goes wrong with an AI output, that's on me.' Her team started using Claude Pro for drafting board-level financial commentary and ChatGPT Plus for preparing lender covenant compliance summaries. Both tasks involve language that carries legal and fiduciary weight. Chen implemented one rule immediately: nothing AI-generated goes to an external party, investors, lenders, auditors, without her personal sign-off. Internal use was more flexible. External use required her eyes.
That single rule, internal vs. external, is one of the cleanest governance frameworks available to small finance teams. It doesn't require a policy document or a technology committee. It requires one decision: who is seeing this output, and what are the consequences if it's wrong? For Chen's team, an internal draft with an error gets corrected before it matters. An external document with an error reaches a lender or investor board. The stakes are categorically different. Her team saves roughly eight hours per month on drafting work. She spends about ninety minutes per month reviewing external-facing AI outputs. That's a return on governance that any CFO can justify.
The Internal vs. External Rule. Start Here
Putting Governance Into Practice: What Finance Teams Actually Do
Governance sounds like a compliance department problem. In practice, for most finance teams using AI tools today, it's three habits: checking outputs before they travel, knowing which tasks AI handles well and which it doesn't, and keeping a short log of errors when they happen. That last one, the error log, is underused and extremely valuable. When the Chicago consulting firm caught the CAGR error, they wrote it down: what the AI produced, what the correct figure was, and what type of task had generated the mistake. Over six months, they had twelve entries. Eleven were in the same category: multi-step percentage calculations. That pattern told them exactly where to focus their verification effort.
An error log doesn't need to be sophisticated. A shared Google Sheet or a note in Notion works fine. Date, tool used, task type, what went wrong, what the correct output should have been. Five columns. Five minutes per entry. Over a quarter, it becomes a map of your team's specific AI risks, not generic risks from a white paper, but the actual failure modes you're encountering in your actual work. That's more useful than any AI risk framework published by a consulting firm, because it's calibrated to your tools, your data, and your workflows.
The third habit, knowing which tasks AI handles well, develops naturally over time, but you can accelerate it by being explicit about categories. Most finance teams find that AI is highly reliable for structuring and formatting (turning bullet points into professional prose), moderately reliable for summarizing documents it's been given (with human verification of key figures), and less reliable for anything requiring calculation, current market data, or operational context it hasn't been provided. Building a simple one-page 'AI use guide' for your team, even just a list of green-light, yellow-light, and red-light task categories, creates shared expectations and reduces the chance that a team member over-trusts an output in a high-stakes situation.
Goal: Produce a working AI governance checklist tailored to your team's actual tools and tasks, plus an active Error Log that will help you identify your highest-risk AI use cases over time.
1. Open a blank document in Word, Google Docs, or Notion, title it 'AI Output Review Checklist' and add today's date. 2. List every AI tool your finance team currently uses or has tried in the past three months (e.g., ChatGPT Plus, Microsoft Copilot, Claude Pro, Xero AI). Write one sentence describing the primary task each tool is used for. 3. For each tool and task combination, write one sentence answering: 'What is the most likely way this output could be wrong?' Use the comparison table in this lesson as a reference if needed. 4. Create two columns: 'Internal Use' and 'External Use.' Under each, list the review steps required before an AI output is used. Internal might be 'read once, check figures,' External might be 'manager sign-off, verify all cited numbers.' 5. Draft a short 'Figures to Verify' habit: write a one-sentence rule your team will follow every time AI produces a document containing specific numbers (e.g., 'Every number cited in an AI output gets checked against the source document before the file is shared.'). 6. Identify one finance task your team has done with AI in the last month. Retrieve that output and spend five minutes checking it against the source data. Note any discrepancies, even small ones, in a new tab labeled 'Error Log.' 7. Set up the Error Log tab with five columns: Date, Tool Used, Task Type, What AI Produced, What Was Correct. Add your finding from Step 6 as the first entry. 8. Share the draft checklist with one colleague on your finance team and ask them to add one risk they've personally noticed when using AI tools. 9. Schedule a 20-minute team meeting within the next two weeks to review the checklist together, agree on the internal vs. external rule, and assign one person to maintain the Error Log each month.
Key Lessons From Part 1
- AI tools in finance produce confident-looking output regardless of accuracy, the absence of uncertainty signals doesn't mean the output is correct.
- The most common AI risk in finance isn't dramatic failure; it's slow, invisible drift between what the AI is doing and what the team assumes it's doing.
- AI language models are not calculators. Always verify numerical claims, especially percentages, compound calculations, and multi-period aggregates, against source documents.
- AI cannot supply the operational context behind financial numbers. Variance analyzis, forecasting narratives, and audit responses all require human knowledge of what actually happened in the business.
- The internal vs. external rule is the simplest effective governance framework for small finance teams: faster review for internal use, mandatory human sign-off before anything reaches external parties.
- A team error log, even a basic spreadsheet, turns individual mistakes into institutional knowledge, helping you identify your highest-risk task categories over time.
- Governance doesn't require a technology department. It requires clear team agreements about which tasks AI handles, how outputs are checked, and who owns the final product.
When AI Gets the Numbers Wrong: Bias, Errors, and the Audit Trail Problem
In 2023, a mid-sized commercial real estate firm in Atlanta rolled out an AI-assisted underwriting tool to speed up loan risk assessments. The tool had been trained on a decade of historical loan data and could produce a risk score for any new application in under 90 seconds. The finance team was thrilled. Processing time dropped by 60%. Then, six months in, a compliance officer noticed something uncomfortable: applications from businesses in predominantly Black and Hispanic neighborhoods were being flagged as high-risk at nearly twice the rate of comparable businesses in majority-white neighborhoods, even when the financials looked nearly identical. The AI wasn't being intentionally discriminatory. It was doing exactly what it was designed to do: learn from historical patterns. The problem was that historical lending data already contained decades of discriminatory patterns, and the model had absorbed them faithfully.
This is what AI researchers call training data bias, and it is one of the most serious risks facing finance teams using AI tools today. The Atlanta firm had not built the tool themselves, they had purchased it from a vendor and deployed it with minimal testing. Nobody had asked the critical governance question: where did this model learn what it knows, and does that history reflect the decisions we actually want to make? When the issue surfaced, the firm faced a potential fair lending violation under the Equal Credit Opportunity Act. They had to pause the tool, hire external auditors, and retrain their staff on manual review protocols, erasing most of the efficiency gains they had celebrated just months earlier.
The lesson extracted from Atlanta is not that AI tools are inherently discriminatory. It is that AI reflects the data it was trained on, and in finance, that data almost always carries the fingerprints of past human decisions, some of which were biased, some of which were wrong, and some of which simply no longer apply to current market conditions. A finance or accounting professional deploying AI does not need to understand machine learning to manage this risk. They need to ask the right questions before deployment and build review processes that catch these patterns before they become regulatory problems.
What 'Training Data Bias' Means for Your Team
The Hallucination Problem in Financial Reporting
Bias is one category of AI risk. Hallucination is another, and in some ways, it is more immediately dangerous for finance professionals because it is invisible. AI hallucination is when a model produces output that sounds completely authoritative and is factually wrong. Not vague. Not uncertain. Confidently, specifically, numerically wrong. A well-known example from 2023 involved a New York attorney who submitted legal filings that cited multiple court cases, all of which had been fabricated by ChatGPT. The cases had realiztic names, realiztic citation formats, and realiztic-sounding legal reasoning. They did not exist. The attorney was sanctioned. This same failure mode applies directly to finance.
Imagine asking an AI assistant to summarize the key financial ratios from a competitor's annual report, or to pull the current IRS depreciation schedule for commercial equipment, or to describe the accounting treatment for a specific type of lease under ASC 842. In each case, a capable AI tool might produce an answer that reads like it came from a senior accountant, and contains figures that are subtly or significantly wrong. The AI is not lying. It is pattern-matching from its training data and generating what a correct answer would plausibly look like. When the training data is slightly outdated, or when the query touches an edge case the model has not seen clearly, the result can be precise-sounding fiction.
The professional protection here is not to stop using AI for financial research or summarization. It is to treat AI-generated financial data the same way you would treat a first draft from a junior analyzt: useful starting point, requires verification before it leaves your desk. Any number, rate, regulation reference, or accounting standard cited by an AI tool should be checked against the primary source, the IRS website, the FASB codification, the actual filing, before it is used in a client document, an audit workpaper, or a board presentation. This is not a burden. It is the same standard of care professionals have always applied to financial information.
Prompt
I need to understand how operating leases are reported on the balance sheet under ASC 842 for a mid-sized manufacturing company. Please summarize the key requirements in plain language, and flag any areas where you are uncertain or where I should verify against current FASB guidance before relying on your answer.
AI Response
Under ASC 842, which took effect for most private companies by 2022, operating leases that previously stayed off the balance sheet must now be recognized as both a right-of-use (ROU) asset and a corresponding lease liability. Here's what that means in practice: **Right-of-Use Asset:** This represents the company's right to use the leased asset over the lease term. It is calculated as the present value of the future lease payments, adjusted for any initial direct costs, prepaid rent, or lease incentives. **Lease Liability:** This is the obligation to make future lease payments, discounted at the rate implicit in the lease, or, if that rate is not readily determinable, at the company's incremental borrowing rate. **Income Statement Treatment:** Unlike finance leases, operating leases show a single straight-line lease expense on the income statement rather than separate interest and amortization charges. **Areas Where You Should Verify:** (1) Short-term lease exemptions, leases under 12 months may qualify for an exemption, but the specific conditions should be confirmed against current FASB ASC 842-20-25. (2) Variable lease payments, the treatment of variable components depends on whether they are based on an index or rate, which affects measurement. (3) Your company's specific discount rate methodology should be reviewed with your auditor. I recommend cross-checking the lease liability calculation methodology against FASB's ASC 842 implementation guidance or your audit firm's technical accounting team before finalizing any balance sheet presentation.
How a Global Retailer Built a Human-in-the-Loop System
Marks & Spencer, the British retailer with operations across 30+ countries, began integrating AI into its finance function for invoice processing and accounts payable automation starting in the early 2020s. The scale of the problem they were solving was real: processing thousands of supplier invoices across multiple currencies, tax jurisdictions, and contract terms created significant manual workload and error risk. The AI tool they deployed could match invoices to purchase orders, flag discrepancies, and route exceptions for human review, handling roughly 80% of invoices without any human touch. That 80% figure is important context. The system was not designed to handle 100%.
The governance structure M&S built around the tool is what made the difference. Every invoice above a certain value threshold, regardless of whether the AI had matched it successfully, required a human sign-off. Any invoice where the AI confidence score fell below a defined threshold was automatically routed to a human reviewer. Monthly audits compared AI-processed invoices against a random sample reviewed manually, to catch any systematic errors the AI might be developing over time. This is the human-in-the-loop model, and it is the gold standard for AI deployment in finance. The AI does the volume work. Humans set the rules, review the exceptions, and periodically audit the patterns. Neither operates in isolation.
Comparing AI Risk Profiles Across Finance Functions
| Finance Function | Primary AI Risk | Consequence if Wrong | Recommended Human Check |
|---|---|---|---|
| Invoice & AP Processing | Mismatched line items, duplicate payments | Overpayment, vendor disputes, cash flow errors | Random sample audits monthly; human review above threshold |
| Expense Categorization | Miscategorized spend, tax misclassification | Incorrect P&L, tax filing errors | Controller review of AI-tagged categories quarterly |
| Financial Forecasting | Overconfident projections based on outdated patterns | Bad strategic decisions, missed targets | CFO/analyzt review of assumptions, not just outputs |
| Credit Risk Scoring | Training data bias, demographic disparities | Fair lending violations, regulatory penalties | Compliance audit before deployment; ongoing disparity monitoring |
| Fraud Detection | False positives blocking legitimate transactions | Customer friction, operational disruption | Tiered review: AI flags, human investigates before action |
| Financial Report Drafting | Hallucinated figures, incorrect regulatory citations | Restatements, audit findings, reputational damage | Every number verified against source data before publication |
| Payroll Processing | Errors in tax withholding, benefit deductions | Employee harm, IRS penalties | HR/payroll specializt review of exception reports weekly |
The CFO Who Trusted the Dashboard Too Much
A regional healthcare network in the American Midwest deployed an AI-powered financial dashboard in 2022 to give its CFO real-time visibility into operating margins across 12 facilities. The tool pulled data from multiple source systems, applied AI-driven categorization, and produced clean visualizations that updated daily. For about eight months, the CFO used this dashboard as her primary decision-making tool, using it to identify underperforming facilities and make staffing recommendations. Then the annual audit revealed that one facility's labor costs had been systematically miscategorized by the AI, contract nursing costs were being classified as permanent staff costs, creating an apparent margin improvement that did not exist. Decisions had been made on that phantom improvement.
The CFO had assumed that because the dashboard looked clean and updated automatically, someone else was responsible for validating the underlying categorization logic. The vendor assumed the client's finance team was reviewing the category mappings. Nobody was. This is the governance gap that appears most often in real AI deployments: the assumption that automation equals accuracy. It does not. Automation means the process runs faster and at scale, which means errors also propagate faster and at scale. The CFO's team implemented a quarterly data validation process after the incident, where a finance analyzt manually reconciles a sample of AI categorizations against the source system records. A simple control. It should have been there from day one.
Build the Governance Before You Deploy the Tool
Making Governance Practical for Your Finance Team
Governance sounds like a word that belongs in a board meeting or a regulatory filing. In practice, for a finance team of five people or fifty, it means three concrete things: knowing what the AI is doing, having a plan when it gets something wrong, and being able to show an auditor or regulator exactly how decisions were made. That last point, the audit trail, is increasingly important as regulators in the US and Europe begin requiring explainability for AI-assisted financial decisions. The EU AI Act, which began phasing in from 2024, classifies AI systems used in credit scoring and financial services as high-risk, requiring documentation of how decisions are made and how errors are caught.
For non-technical finance professionals, building an audit trail does not require understanding how the AI model works. It requires documenting the human decisions made around the AI. Which outputs did your team review? What did you change, and why? Who approved the final number? These are the same questions a good controller has always asked about any financial process, the AI just adds a new step in the chain that also needs to be documented. Tools like Microsoft Copilot in Excel keep version histories. ChatGPT's conversation logs can be exported. Many enterprise AI tools in finance. Workiva, Sage Intacct AI features, Oracle Fusion Analytics, include built-in audit logging. Use it.
The practical starting point for most finance teams is a one-page AI use policy that covers the tools currently in use, the workflows they touch, the human review requirements, and the escalation path when something looks wrong. This does not need to be a legal document. It needs to be clear enough that a new team member joining next month would understand exactly where AI is involved in the team's work, what their responsibility is when using it, and who to call when the AI produces something that does not look right. That document, simple, specific, owned by a real person, is the foundation of everything else in AI governance.
Goal: Produce a signed, team-approved one-page AI use policy that documents current tools, human review requirements, escalation thresholds, and prohibited uses, ready to present to a manager, auditor, or compliance team.
1. Open a blank document in Word, Google Docs, or Notion and title it 'AI Use Policy, [Your Team Name], [Current Month/Year]'. 2. List every AI tool your team currently uses or has recently tested, include general tools like ChatGPT or Copilot and any finance-specific tools like Sage AI, Workiva, or QuickBooks AI features. 3. For each tool, write one sentence describing which workflow it is used in, for example, 'ChatGPT Plus: drafting variance explanations for monthly management reports.' 4. For each workflow, identify the specific human review step, who reviews the AI output, and before it goes where? Write this as a named role, not just 'someone on the team.' 5. Define a dollar or volume threshold for automatic human escalation, for example, 'Any AI-flagged invoice over $10,000 requires controller sign-off before payment.' 6. Write one sentence describing how your team will handle it if the AI produces something clearly wrong, who is notified, what is the fallback process? 7. Add a 'Prohibited Uses' section, list at least two things your team has agreed AI should not do without full human preparation of the final output, such as submitting regulatory filings or approving payments. 8. Name one person as the 'AI Policy Owner' responsible for updating this document when tools or workflows change. 9. Share the document with your team lead or manager for review, and schedule a 30-minute discussion to agree on the contents before the next period close.
Key Principles from Part 2
- AI tools in finance inherit the biases present in their training data, historical lending, expense, or credit data often encodes past discriminatory or flawed decisions that the model will replicate at scale.
- AI hallucination, confidently wrong outputs, is a specific risk in financial reporting, research, and regulatory interpretation. Every AI-generated number or regulatory citation needs verification against the primary source.
- The human-in-the-loop model is the governance standard: AI handles volume, humans set thresholds, review exceptions, and periodically audit patterns to catch systematic errors before they compound.
- Automation does not equal accuracy. Errors in AI-driven finance processes propagate faster and at greater scale than manual errors, which makes periodic sampling and validation controls more important, not less.
- An audit trail for AI-assisted financial decisions is not optional, it is a regulatory requirement in multiple jurisdictions and a basic professional standard. Document what the AI produced, what humans reviewed, and who approved the final output.
- Governance does not require technical expertise. It requires clear ownership, defined thresholds, documented review steps, and a written policy that every team member understands and follows.
When the Audit Trail Goes Cold: Accountability in AI-Assisted Finance
In 2023, a mid-sized asset management firm in Frankfurt quietly pulled back its AI-assisted portfolio risk tool after a compliance review uncovered something uncomfortable: the model had been flagging low-risk scores for a category of bonds that later defaulted. The tool had been trained on pre-2020 data, missing the interest rate volatility patterns that emerged post-pandemic. Nobody had formally signed off on the model's assumptions. Nobody had documented who approved its use in client-facing risk summaries. When regulators asked for the audit trail, there wasn't one. The financial loss was manageable. The reputational and regulatory exposure was not.
What made this case instructive wasn't the AI error itself, models have always had limitations. What made it serious was the absence of governance. No one owned the decision to deploy the tool. No one had scheduled a review cycle. The AI had drifted from a helpful assistant into an unchecked authority, quietly shaping client outcomes while the humans around it assumed someone else was watching. That assumption, that accountability lives somewhere else, is the central risk of AI adoption in finance and accounting today.
The principle extracted from Frankfurt is blunt: AI tools in finance require the same ownership structure as any financial control. Someone signs off. Someone reviews. Someone is named. Without that structure, you don't have AI-assisted decision-making, you have AI-driven decision-making with a human face on the report cover. Regulators, auditors, and increasingly clients are starting to demand the difference be made explicit and documented.
What Financial Regulators Are Watching
The Three Failure Modes That Keep Finance Leaders Up at Night
JPMorgan Chase processes millions of documents using AI, contracts, compliance filings, loan agreements. Their COIN platform famously reduced 360,000 hours of annual legal review work. But JPMorgan also employs hundreds of people specifically to oversee, audit, and validate AI outputs. That ratio, heavy AI use paired with equally heavy human review infrastructure, reflects a mature understanding of where AI fails in finance. The three failure modes are consistent across organizations: hallucinated numbers presented as facts, stale training data that doesn't reflect current market conditions, and model outputs that optimize for the wrong metric because the original brief was ambiguous.
Each failure mode has a human governance fix. Hallucinated numbers are caught by source-checking protocols, a rule that no AI-generated figure appears in a client document without a verified source cited alongside it. Stale data is caught by model review schedules, quarterly at minimum, monthly for tools touching volatile asset classes. Wrong-metric optimization is caught at the prompt level, when you ask an AI tool to 'summarize financial performance,' you get a different answer than when you ask it to 'identify the three largest variances against budget and explain likely causes.' Specificity in your instructions is a governance act, not just a productivity tip.
KPMG's internal AI governance framework, published in 2023, formalized exactly this approach. Their guidance requires that any AI-assisted output used in client deliverables be labeled, reviewed by a named professional, and traceable to source data. It's not bureaucracy for its own sake, it's the same standard applied to any analytical work product. The AI doesn't change the accountability standard. It changes how fast you can produce the work that must meet that standard.
Prompt
I manage a 6-person finance team at a mid-sized manufacturing company. We've started using ChatGPT to help with variance analyzis write-ups, budget commentary, and summarizing board reports. I need a simple, one-page internal policy that sets rules for how the team should use AI tools, covering what's allowed, what requires human review before sharing, and what's off-limits. Write it in plain language our team will actually follow.
AI Response
**Finance Team AI Use Policy. Internal Guidelines** **What AI tools we use:** ChatGPT (free or Plus), Microsoft Copilot in Excel and Outlook. No other AI tools without manager approval. **What you CAN use AI for:** - First drafts of variance commentary and budget narratives - Summarizing long documents or meeting notes - Generating options for how to structure a report or presentation - Proofreading and improving clarity of written work **What ALWAYS requires human review before sharing:** - Any document containing financial figures, percentages, or named variances - Any output going to the CFO, board, auditors, or external parties - Summaries of contracts, compliance documents, or policy materials **What is OFF-LIMITS:** - Entering actual customer data, employee salary data, or confidential contract terms into any AI tool - Using AI output as a final answer without checking the numbers against source data - Presenting AI-generated content as your own analyzis without review **The review rule:** If it has a number in it, you check the number. If it's going outside the team, a second person reads it. **Who owns this policy:** [Finance Manager name]. Review date: [Quarterly]. Questions? Raise them in our weekly team check-in before using a new tool or workflow.
How a Regional Hospital Network Got This Right
Finance governance lessons don't only come from banks and consulting firms. A regional hospital network in the US Midwest began using AI tools in 2022 to assist with revenue cycle management, predicting claim denials, flagging billing anomalies, and summarizing payer contract terms. Their CFO made one structural decision that shaped everything else: every AI-assisted recommendation required a named human reviewer before any action was taken. Not a department. A person. With their name on the output.
Within 18 months, that single rule had created an unexpected benefit beyond compliance. The reviewers started noticing patterns in where the AI was wrong. They documented those patterns. The documentation became training material. New finance staff learned faster because the error patterns were catalogd. The governance process, which had felt like friction at first, became a knowledge asset. Accountability structures in AI, when designed well, don't slow teams down permanently. They build institutional memory.
| Risk Area | What Can Go Wrong | Human Governance Fix | Tools That Help |
|---|---|---|---|
| Hallucinated figures | AI invents numbers that look plausible | Source-check rule: every figure needs a cited source | ChatGPT with browsing, Copilot in Excel |
| Stale training data | Model reflects outdated market or regulatory context | Scheduled model/tool review, quarterly minimum | Vendor release notes, internal review calendar |
| Data privacy breach | Confidential data entered into public AI tool | Banned data list: no client PII, salary, or contract data in free tools | Microsoft Copilot (enterprise), private instances |
| Unclear accountability | No one owns the AI output or its errors | Named reviewer policy: every AI output has a human signatory | Document templates with reviewer fields |
| Prompt ambiguity | Vague instruction produces misleading summary | Prompt standards: require specificity in financial queries | Internal prompt library, team training |
The CFO Who Turned Governance Into a Competitive Edge
A CFO at a 200-person SaaS company in Austin started requiring that every AI-assisted financial analyzis include a brief 'confidence note', two or three sentences written by the analyzt explaining what the AI did, what data it used, and what the analyzt personally verified. It added maybe ten minutes to each deliverable. Within a quarter, the board started commenting on how unusually clear and well-sourced the financial presentations were. Investors began asking if the company had changed its reporting process. They had, but not by adding more AI. By adding more human judgment, visibly.
That CFO understood something counterintuitive: in a world where AI-generated content is everywhere, documented human judgment is a differentiator. Stakeholders, boards, auditors, clients, regulators, are increasingly sophisticated about AI use. Showing your work isn't just a compliance act. It's a trust signal. The finance teams that will earn the most credibility over the next five years are the ones that can say, clearly and specifically, what their AI did and what their humans decided.
Start With One Rule, Not a Full Policy
Making Governance Practical Without Making It Painful
Governance sounds like a compliance department word. In practice, for a finance or accounting professional using AI tools day-to-day, it means three things: knowing what your AI tool can and cannot see, knowing what a human must check before the output leaves your desk, and knowing who to call when something looks wrong. None of that requires a legal background or a technology degree. It requires the same professional judgment you already apply to spreadsheets, vendor invoices, and expense reports.
The practical starting point is a short personal checklist, not a corporate policy document, just your own working rules. What data will you never put into a free AI tool? What outputs will you always verify before forwarding? What's the name of the person who should know if an AI tool you're using starts producing results that seem off? Those three questions, answered and written down, are a governance framework. Imperfect, improvable, but real.
The finance professionals who navigate AI risks well aren't the ones who avoid AI or the ones who trust it completely. They're the ones who treat AI the way a good analyzt treats a junior colleague: useful, fast, occasionally wrong, and always requiring a senior eye before the work goes out the door. That mental model. AI as capable but supervised contributor, is the most practical governance tool available to any non-technical professional right now.
Goal: Create a practical, one-page personal policy for how you will use AI tools in your finance or accounting work, covering permissions, review rules, and data boundaries.
1. Open ChatGPT (free at chat.openai.com) or Claude (free at claude.ai) and start a new conversation. 2. Describe your role in one sentence, your job title, your team size, and the main financial tasks you handle (e.g., budget reports, invoices, forecasting commentary). 3. Ask the AI to draft a personal AI use checklist for someone in your role, covering what AI can help with, what always needs human review, and what data should never be entered into a free AI tool. 4. Read the draft and edit it to reflect your actual work. Add at least two specific examples from your own job, a task you would use AI for, and a data type you would never share with it. 5. Add a 'Review Rule' line: write the name of one colleague or manager who should review AI-assisted outputs before they go to external stakeholders. 6. Add a 'Red Flag Rule' line: write one specific sign that would tell you an AI output needs to be double-checked (e.g., 'any variance figure I don't recognize from the source data'). 7. Save the checklist as a document titled '[Your Name]. AI Use Guidelines' and share it with one colleague for feedback. 8. Set a calendar reminder to review and update the checklist in 90 days. 9. Note one thing you will do differently in your AI use starting this week based on what you wrote.
- AI errors in finance aren't rare edge cases, hallucinated figures, stale data, and miscalibrated models are predictable failure modes that require designed-in human checks.
- The absence of an audit trail is itself a governance failure. If you can't explain who approved an AI tool's use and who reviewed its output, you are exposed, regardless of whether the output was correct.
- Named accountability beats departmental accountability. A person's name on a reviewed AI output creates responsibility and builds institutional knowledge about where errors occur.
- Data privacy is the non-negotiable boundary. Client PII, salary data, and confidential contracts must never enter free, public AI tools, this is a rule, not a guideline.
- Prompt specificity is a governance act. Vague instructions produce outputs that are harder to verify and more likely to mislead. Specific, well-constructed prompts reduce the governance burden on the back end.
- Governance structures, when designed well, generate institutional memory, documented error patterns become training assets that make teams faster and more accurate over time.
- Visible human judgment is a competitive differentiator. In a world of AI-generated content, stakeholders increasingly value and reward clearly documented human review.
Key Takeaways
- AI governance in finance is not a technology problem, it is a professional accountability problem with practical, non-technical solutions.
- The three core governance actions any finance professional can take today: name a reviewer, define your data boundaries, and build a source-check habit.
- Regulators including the SEC, FCA, and EU AI Act are actively developing and enforcing standards for AI use in financial services, documentation and explainability are the baseline expectations.
- The mental model that works: treat AI as a capable but supervised contributor, not an autonomous authority.
- Start with one rule, not a perfect policy. The best governance framework is the one your team will actually follow.
This lesson requires Pro
Upgrade your plan to unlock this lesson and all other Pro content on the platform.
You're currently on the Free plan.
