Safer Communities: Better Decisions in Public Safety
AI in Public Safety and Criminal Justice
Part 1: Promise, Peril, and the People in Between
When the Algorithm Got It Wrong: The Robert Williams Case
Historical Record
Robert Williams
In January 2020, Robert Williams was arrested in Detroit after being misidentified by a facial recognition system in a retail theft case. The arrest marked the first documented wrongful arrest in the United States directly caused by facial recognition AI.
The case became a critical flashpoint in debates about algorithmic bias and accountability in criminal justice systems.
The Detroit case wasn't an isolated glitch. It was the first documented wrongful arrest in the United States directly caused by facial recognition AI, but researchers at MIT Media Lab had already flagged the underlying problem years earlier. Their 2018 Gender Shades study found that commercial facial recognition systems misidentified darker-skinned women at rates up to 34% higher than lighter-skinned men. The technology was trained predominantly on faces that didn't represent the full population. When deployed in high-stakes policing contexts, those errors didn't just produce bad data, they produced handcuffs.
The Williams case became a flashpoint in a much larger debate that public sector professionals are now navigating every day: AI tools can genuinely help overstretched police departments, courts, parole boards, and emergency services do more with limited resources. But the same tools can encode and amplify existing inequities at scale, moving faster than any human oversight system was designed to handle. Understanding that tension, not just the benefits or the risks in isolation, is what this lesson is about.
The Core Tension in Public Safety AI
Predictive Policing: The Story of PredPol in Los Angeles
In 2011, the Los Angeles Police Department became one of the first major departments in the world to deploy a predictive policing tool called PredPol, later rebranded as Geolitica. The software analyzed historical crime data and generated maps highlighting 500-square-foot boxes where crimes were statistically more likely to occur on a given day. Officers were directed to patrol those boxes more frequently. LAPD leadership reported early results they found promising: certain property crime rates dropped in pilot precincts. The tool spread to dozens of departments across the US and UK within a few years.
The principle PredPol was built on sounds reasonable: if you know where crime historically clusters, you can deploy limited patrol resources more efficiently. The flaw, which researchers and civil liberties organizations began documenting almost immediately, was circularity. The historical crime data the algorithm learned from reflected where police had previously patrolled, not necessarily where crime actually occurred most. Neighborhoods that had been over-policed historically generated more arrest records, which fed the algorithm, which sent more officers there, which generated more records. The tool wasn't predicting crime. In many cases, it was predicting policing.
By 2020, Santa Cruz, California became the first US city to ban predictive policing outright. Los Angeles suspended its use of PredPol the same year following a joint investigation by the Los Angeles Times and The Markup, which found the tool disproportionately flagged low-income Black and Latino neighborhoods. Geolitica shut down entirely in 2023. The lesson public sector managers draw from this story isn't that data-driven policing is inherently wrong, it's that the quality and history embedded in your input data determines the fairness of your output, and no dashboard makes that visible unless someone specifically looks for it.
Prompt
I'm a police captain preparing a briefing for city council on whether our department should adopt an AI-powered crime forecasting tool. Council members are not technical experts. I need a 3-minute verbal briefing that covers: what these tools actually do, two documented benefits from other departments, two documented risks or failures, and three questions we should require any vendor to answer before we sign a contract. Keep the language clear and non-technical.
AI Response
Good evening, Council Members. Thank you for the opportunity to address this directly. Crime forecasting tools use historical data, past incident reports, call logs, time of day, location, to flag areas or time windows where certain types of crime are statistically more likely. Think of it like a weather forecast for patrol deployment. The goal is to help us put officers where they're most needed, rather than relying purely on intuition or tradition. Departments in cities including New Orleans and Chicago have reported reductions in response times and more efficient overnight patrol coverage after adopting similar systems. Some have documented measurable drops in property crime in targeted areas during pilot periods. However, two serious failures deserve your attention. First, in Los Angeles, a predictive tool was found to send officers repeatedly to the same low-income neighborhoods because its training data reflected historical over-policing, creating a feedback loop rather than a genuine forecast. Second, in Detroit, a separate AI tool produced a wrongful arrest because the underlying facial recognition system had documented accuracy gaps across racial groups. Before we sign any contract, I recommend we require vendors to answer three questions: One, what data was this tool trained on, and how recent is it? Two, has this tool been independently audited for racial or geographic bias, and can we see that audit? Three, what role does the tool play in decision-making: does it inform officers, or does it direct them? Those three questions won't guarantee a safe procurement. But they'll tell us immediately whether a vendor has thought seriously about the risks, or whether they're hoping we haven't.
Risk Assessment in Sentencing: The COMPAS Story
In 2016, ProPublica published an investigation into a tool called COMPAS. Correctional Offender Management Profiling for Alternative Sanctions, used by judges in Wisconsin and dozens of other states to inform sentencing and parole decisions. COMPAS generated a risk score, from one to ten, estimating how likely a defendant was to reoffend. Judges weren't required to follow the scores, but the scores appeared in pre-sentencing reports. ProPublica's analyzis of 7,000 defendants in Broward County, Florida found that Black defendants were nearly twice as likely as white defendants to be falsely flagged as high-risk when they did not go on to reoffend.
The company behind COMPAS, Northpointe, disputed ProPublica's methodology. Researchers published competing analyzes. The debate about the statistical definitions of fairness, which turned out to be genuinely incompatible with each other mathematically, ran for years in academic journals. But for the courtroom professionals in the middle of this, judges, public defenders, prosecutors, probation officers, the practical question was simpler and more urgent: when a defendant's attorney asked what factors produced a particular risk score, Northpointe refused to disclose the algorithm, citing proprietary information. The Wisconsin Supreme Court ultimately ruled that using COMPAS scores was constitutional, but the lack of transparency remained a problem that no court ruling could fully resolve.
Comparing AI Applications Across Public Safety Functions
| Application | What It Does | Documented Benefit | Documented Risk | Current Status |
|---|---|---|---|---|
| Predictive Policing (e.g., PredPol/Geolitica) | Forecasts high-crime areas using historical incident data | More efficient patrol deployment in some pilot studies | Feedback loops amplifying over-policed neighborhoods | Geolitica shut down 2023; bans in multiple US cities |
| Facial Recognition (e.g., Clearview AI) | Matches photos against large databases to identify suspects | Helped solve cold cases and locate missing persons | High error rates for darker-skinned individuals; wrongful arrests documented | Banned or restricted in many jurisdictions; ongoing litigation |
| Risk Assessment Tools (e.g., COMPAS, PSA) | Scores defendants' likelihood of reoffending or missing court | Reduces reliance on purely subjective judicial intuition | Racial disparities in false-positive rates; opacity of scoring factors | Widely used but increasingly scrutinized; some jurisdictions phasing out |
| 911 Dispatch AI (e.g., RapidSOS, Amazon Alexa Emergency) | Prioritizes calls, routes responders, surfaces caller data in real time | Faster response times; better data for dispatchers under pressure | Over-reliance risk; edge cases where AI misclassifies urgency | Growing adoption; generally lower controversy than policing tools |
| Gunshot Detection (e.g., ShotSpotter) | Acoustic sensors + AI to detect and locate gunfire automatically | Faster police response to shootings in some urban deployments | High false-positive rates documented; expensive; privacy concerns | Active in 100+ US cities; contracts canceled in Chicago and others |
| Recidivism Prediction for Parole Boards | Assesses likelihood of reoffense to inform release decisions | Adds structured data to supplement parole board discussions | Proxies for race embedded in socioeconomic factors used as inputs | Used in multiple states; civil rights challenges ongoing |
Emergency Dispatch: Where AI Is Doing Quieter, Less Contested Work
Not every AI application in public safety lands in a courtroom or a civil rights lawsuit. In emergency dispatch centers, some of the highest-stress, highest-volume workplaces in the public sector. AI tools have been doing genuinely useful work with considerably less controversy. RapidSOS, used by more than 5,000 emergency communication centers across the US, pulls real-time data from a caller's phone. GPS coordinates, device health data, sometimes crash sensor information from connected vehicles, and surfaces it directly in the dispatcher's interface alongside the 911 call. Dispatchers don't have to change how they work. The data just appears, reducing the time spent asking 'where are you exactly?' when a caller is panicked or disoriented.
The contrast with predictive policing and risk scoring tools is instructive. RapidSOS augments a human decision-maker with better real-time information but does not make the decision. The dispatcher still decides how to classify the call, which units to send, and how to prioritize competing emergencies. The AI reduces friction and fills data gaps; it does not replace judgment. Public sector managers evaluating AI tools for their agencies can use this as a working framework: tools that give humans better information tend to generate far fewer problems than tools that generate scores or recommendations that influence consequential decisions about individual people.
The 'Inform vs. Decide' Test
What This Means for Public Sector Managers Right Now
If you manage a team in policing, courts, corrections, emergency services, or any adjacent public sector role, you are almost certainly going to be asked to evaluate, implement, or defend an AI tool within the next two to three years, if you haven't been already. Budget pressures, staffing shortages, and vendor marketing are all pushing in the same direction. The cases in this lesson aren't cautionary tales meant to make you say no to everything. They're a map of where the terrain gets dangerous, so you can move through it carefully rather than stumbling in.
The professionals who navigated these situations badly, the Detroit investigators who accepted a facial recognition match without further verification, the judges who cited COMPAS scores in sentencing without understanding what produced them, weren't malicious. They were busy, under-resourced, and told by vendors and leadership that the technology was reliable. The gap between what AI vendors claim and what independent audits find is not a minor technical footnote. It is the central management problem. Your job isn't to become a data scientist. It is to know which questions to ask, who should be asking them, and what an acceptable answer looks like.
Across the rest of this lesson, you'll build a practical vocabulary for evaluating these tools, covering bias audits, transparency requirements, community engagement, and the specific red flags that experienced procurement officials now look for in vendor contracts. You'll also look at jurisdictions that have gotten this right: places that adopted AI tools thoughtfully, built in accountability structures from day one, and were able to demonstrate public benefit without producing the kind of documented harm that ended PredPol and Geolitica. The goal is to give you a working framework you can apply to a real decision on your desk.
Goal: Identify one AI tool currently used or under consideration in your agency or department, and apply a basic accountability framework to assess its risk profile, without needing any technical background.
1. Identify one AI tool your agency currently uses or is considering. This could be a facial recognition system, a risk scoring tool, a dispatch platform, a records management system with AI features, or even a general tool like Microsoft Copilot used by staff. Write down its name and what it is supposed to do in one or two sentences. 2. Open ChatGPT Plus or Claude and paste this prompt: 'I work in [your role] at a [type of agency]. We use/are considering [tool name]. Summarize what this tool does, any documented controversies or accuracy concerns, and three questions I should ask the vendor about fairness and transparency. Keep the language non-technical.' 3. Read the AI's response carefully. Highlight or note any risks or controversies you were not previously aware of. 4. Search the vendor's website for the words 'bias audit,' 'independent evaluation,' or 'accuracy study.' Note whether you find anything, or find nothing. 5. Check whether your jurisdiction (city, county, or state) has any existing AI governance policies, algorithmic accountability ordinances, or procurement guidelines. A quick search for '[your city/state] algorithmic accountability policy' will often surface this. Write down what you find. 6. Write three questions you would want answered before your agency expanded its use of this tool. Use the 'Inform vs. Decide' framework from this lesson to assess which category the tool falls into. 7. Draft a one-paragraph summary you could share with a supervisor or colleague explaining the tool, one key benefit, one documented or potential risk, and your recommended next step, whether that's proceeding, requesting an audit, or pausing until more information is available. 8. Save this summary. You'll use it as a reference case study in Part 3 of this lesson when we cover governance frameworks and procurement checklists. 9. Optional: If your agency does not currently use any AI tools you're aware of, complete this task using ShotSpotter as your case study. Search 'ShotSpotter accuracy criticism' and 'ShotSpotter Chicago contract' to gather the background you need.
Key Lessons from Part 1
- AI tools in public safety are already deployed at scale, in policing, courts, corrections, and emergency dispatch, and decisions about them are being made by managers, not technologists.
- The Robert Williams wrongful arrest case established a documented real-world cost of deploying facial recognition AI without adequate human verification protocols.
- Predictive policing tools like PredPol can create feedback loops when trained on historically biased data, sending more officers to already over-policed communities rather than genuinely forecasting crime.
- Risk assessment tools like COMPAS have documented racial disparities in false-positive rates, and their opacity makes it difficult for defendants, attorneys, or judges to challenge individual scores.
- Not all public safety AI is equally controversial: tools that give dispatchers better real-time information (like RapidSOS) have a meaningfully different risk profile than tools that score or rank individual people.
- The 'Inform vs. Decide' test is a practical first-pass filter any manager can apply: tools that augment human judgment tend to cause fewer documented harms than tools that produce scores or recommendations appearing in official records.
- Vendor claims about accuracy and fairness frequently diverge from independent audit findings, and the gap between the two is the core management problem, not a technical footnote.
When Algorithms Meet the Courtroom
In 2013, a Wisconsin man named Eric Loomis was sentenced to six years in prison. The judge cited a risk assessment tool called COMPAS. Correctional Offender Management Profiling for Alternative Sanctions, in the sentencing decision. Loomis challenged the sentence, arguing he had a right to know how the algorithm scored him. The Wisconsin Supreme Court disagreed. The score, generated by a private company called Equivant, was treated as one factor among many. But the case exposed something the public sector had quietly been doing for years: using algorithmic scores to inform decisions about human freedom without full transparency about how those scores were built.
ProPublica's 2016 investigation into COMPAS became one of the most cited pieces of journalism in AI ethics. Reporters analyzed over 7,000 arrests in Broward County, Florida, and found that Black defendants were nearly twice as likely as white defendants to be falsely flagged as high risk for future crime. White defendants were more likely to be incorrectly labeled low risk, and then go on to reoffend. Equivant disputed the methodology. Academic researchers disputed Equivant's response. The debate continues today. What nobody disputes: the tool was making consequential predictions about people, and very few of the judges, probation officers, or defendants involved fully understood how it worked.
This is the core tension at the heart of AI in criminal justice. The tools are real, deployed, and influential. The people using them, and subject to them, often lack the technical background to scrutinize them. And the stakes are not quarterly targets or marketing budgets. They are years of someone's life. For public sector professionals working in courts, corrections, law enforcement, or policy, understanding what these tools actually do, and what they cannot do, is now a basic professional competency.
The Principle: Automation Doesn't Remove Accountability
What the Loomis case teaches is not that risk assessment tools are inherently wrong. Courts have always used presentence reports, psychological evaluations, and prior record summaries, all of which carry their own biases. The lesson is narrower and more actionable: when a decision-support tool influences a high-stakes outcome, the professional in the room remains accountable for that outcome. The algorithm doesn't go to jail. The judge signs the order. The parole officer approves the release. The hiring manager offers the job. Automation shifts the speed of a decision, not the responsibility for it.
What Is a Risk Assessment Tool?
Predictive Policing: The Los Angeles Story
The Los Angeles Police Department ran a predictive policing program called PredPol from 2011 to 2020. The software analyzed historical crime data and generated maps showing 500-square-foot boxes where crimes were statistically likely to occur. Officers were directed to patrol those boxes more frequently. The theory was elegant: get ahead of crime instead of responding to it. The LAPD, one of the largest and most technologically resourced departments in the world, was an early adopter and a prominent case study for the vendor.
A 2021 investigation by the Los Angeles Times and research from the Brennan Center for Justice identified a feedback loop problem. When officers patrol a specific area more, they make more arrests there, not necessarily because more crime is occurring, but because more eyes are watching. Those additional arrests then feed back into the algorithm as confirmation that the area is high-crime. The model reinforces its own predictions. Areas that were over-policed historically became over-policed algorithmically. The LAPD discontinued PredPol in 2020 amid broader scrutiny of police surveillance technology. Santa Cruz, California, became the first U.S. city to ban predictive policing software outright.
For a public sector manager, the PredPol story is a masterclass in what happens when a tool's outputs are treated as ground truth rather than as one data point to be interrogated. Nobody at the LAPD was trying to encode discrimination into a spreadsheet. The officers using the maps were following a process their department had approved. The problem was systemic, a failure to ask hard questions about what the historical data actually represented before building policy on top of it. That failure of critical oversight is not a technical failure. It is a leadership and governance failure.
Prompt
I work for a county government evaluating a proposal from a vendor selling an AI-powered risk assessment tool for our pretrial services department. The vendor claims their tool is 'validated' and 'bias-tested.' I am not a data scientist. What questions should I ask the vendor in our next meeting to understand whether this tool is appropriate for our community? Focus on questions a non-technical manager can ask and understand the answers to.
AI Response
These are the questions you should bring to that meeting, and push for clear, non-jargon answers. **On the data foundation:** - What historical data was used to train this model, and does it include data from our county or similar communities? - How old is that data? If it's from 2005, crime patterns, policing practices, and community demographics may have shifted significantly. **On bias testing:** - Which demographic groups were tested for disparate impact? Ask specifically about race, gender, age, and zip code. - What was the false positive rate, people labeled high-risk who did not reoffend, broken down by race? Ask for this number in writing. **On transparency:** - Can a defendant, their attorney, or a judge see which specific factors drove their score? - Has this tool been audited by an independent third party? Can we see that report? **On accountability:** - If this tool produces a score that leads to a wrongful detention and we're sued, what is your company's liability? - What happens when the tool is wrong? What is the appeals or correction process? **On ongoing performance:** - How often is the model retrained with new data? - Will we receive regular accuracy and disparity reports after deployment? If the vendor cannot answer the bias and transparency questions clearly, that is your answer about readiness.
A Different Sector, The Same Problem: AI in Hiring
In 2018, Reuters reported that Amazon had scrapped an internal AI recruiting tool it had been developing since 2014. The system was designed to review resumes and score candidates on a scale of one to five stars. Engineers discovered the model had learned to penalize resumes that included the word 'women's', as in 'women's chess club', and downgraded graduates of all-women's colleges. The model had been trained on ten years of Amazon's own hiring data. That data reflected a decade of a male-dominated tech industry. The algorithm learned the pattern: people who look like our current employees get hired. It then automated that pattern at scale.
The parallel to COMPAS and PredPol is direct. Historical data reflects historical decisions, which reflect historical biases. Feed that data into a machine learning model and you get a very fast, very confident version of those same biases, now dressed in the authority of mathematics. Amazon's engineers caught the problem before deployment. Many organizations, in both the private and public sector, have not been so careful. For HR professionals, recruiters, and hiring managers using AI-assisted screening tools today, the Amazon story is a standing reminder: automated does not mean neutral, and efficient does not mean fair.
Comparing AI Tools Across Public Safety Contexts
| Tool Type | Example Products/Programs | Primary Use | Key Risk | Who Is Accountable |
|---|---|---|---|---|
| Pretrial Risk Assessment | COMPAS, PSA (Public Safety Assessment), ORAS | Bail and detention decisions | Racial disparities in false positive rates | Judges, pretrial services officers |
| Predictive Policing | PredPol (now Geolitica), ShotSpotter, HunchLab | Patrol deployment, crime forecasting | Feedback loops amplifying over-policing | Police chiefs, city leadership |
| Facial Recognition | Clearview AI, Amazon Rekognition, NEC NeoFace | Suspect identification, surveillance | High error rates for darker skin tones | Investigators, department leadership |
| Gunshot Detection | ShotSpotter (now SoundThinking) | Acoustic detection of gunfire | False positives, privacy concerns | Dispatchers, responding officers |
| Parole/Probation Supervision | BI SmartLINK, Shadowtrack | Remote monitoring, compliance tracking | Algorithmic errors triggering violations | Probation officers, courts |
| AI-Assisted Report Writing | Axon Draft One, Microsoft Copilot for Police | Drafting incident reports from body cam audio | Accuracy of transcription, legal admissibility | Reporting officer, supervisor |
The Prosecutor's Office That Got It Right
Not every story in this space is a cautionary tale. The Mecklenburg County District Attorney's Office in North Carolina, covering Charlotte, piloted an approach that put human oversight at the center of AI use rather than treating it as an afterthought. When prosecutors began using data analytics tools to review case histories and charge patterns, the office paired the rollout with mandatory training for every staff member who would interact with the outputs. Crucially, they established a clear policy: algorithmic outputs could inform recommendations but could not replace prosecutorial discretion. Every decision still required a named human being to own it.
The office also published quarterly reports on charge patterns broken down by demographics, not because the law required it, but because leadership decided that transparency was the only credible defense against the allegation that they were hiding behind algorithms. This kind of proactive accountability is rare. It requires leadership that understands both the utility of AI tools and their potential for harm. It also requires a culture where staff feel safe flagging when a tool's output doesn't match their professional judgment. In a hierarchical environment like a DA's office, creating that culture is harder than buying the software, and more important.
The Practical Test for Any AI Tool in Your Department
Putting This to Work in Your Role
For most public sector professionals reading this, the immediate application is not building an AI system, it is evaluating, overseeing, or using one that someone else built. That is a specific skill set, and it is different from technical AI development. It involves reading vendor documentation critically, asking the right questions in procurement meetings, designing internal policies that govern how AI outputs are used, and building review processes that keep humans genuinely in the loop, not just nominally. The Mecklenburg County example shows this is achievable without a computer science degree.
If you manage a team that uses AI-assisted tools, whether that's a risk scoring system, an AI-drafted report tool like Axon Draft One, or a scheduling algorithm for patrol deployment, your role is to set the norms. What does your team do when the tool produces a result that surprises them? Do they accept it because the machine said so? Do they know how to flag a concern? Have you made clear that professional judgment still matters and that staff won't be penalized for pushing back on an algorithm? These are management questions, not technical ones. The technology is only as trustworthy as the governance around it.
If you are in a policy, legal, or oversight role, your leverage point is documentation and standards. Requiring vendors to provide bias audits, demanding that departments publish outcome data, and building review cycles into procurement contracts are all practical tools available right now. Several states, including California, Illinois, and New York, have passed or proposed legislation requiring algorithmic impact assessments for government AI tools. Even where legislation doesn't yet exist, you can build equivalent standards into departmental policy. The goal is not to prevent AI adoption. It is to make sure adoption happens with eyes open.
Goal: Conduct a structured accountability review of one AI-assisted tool in your public safety or criminal justice context, producing a one-page summary that identifies risks and recommends governance safeguards.
1. Identify one AI tool currently used or being evaluated in your department, this could be a risk assessment score, a predictive analytics platform, a report-drafting tool, or a scheduling algorithm. Write down its name and primary stated purpose. 2. Using ChatGPT or Claude, enter this prompt: 'I work in [your role] at [type of agency]. We use/are evaluating [tool name]. List 10 accountability questions a non-technical manager should ask before or during deployment of this type of AI tool in a public safety context.' 3. Review the AI-generated questions. Add or modify any that are specific to your community, legal context, or departmental policies. You should end up with a list of 10–15 questions. 4. Research your tool online for 15 minutes. Look for independent audits, academic studies, news investigations, or vendor-published validation reports. Note what you find and what you cannot find. 5. Ask ChatGPT or Claude: 'What are the known accuracy or bias concerns associated with [tool name or tool type]? Summarize what independent researchers have found.' Cross-check at least one claim against a real source before relying on it. 6. Draft a one-page summary using this structure: Tool name and purpose → How it is currently used in your department → Key risks identified → What documentation or audit evidence exists → What governance safeguards are currently in place → What safeguards are missing. 7. Identify one concrete policy change or oversight mechanism your department could implement without a budget increase or technical change, for example, a mandatory human review step, a monthly outcome report, or a staff training requirement. 8. Share your one-page summary with one colleague and ask: 'Is there anything here that surprises you, or anything I missed?' Document their response. 9. Identify the right person in your organization, a supervisor, legal counsel, a procurement officer, to receive this summary, and plan how you will present it.
Key Lessons from This Section
- Algorithmic tools in criminal justice are already widespread, risk assessments, predictive policing, facial recognition, and AI report writing are deployed in departments across the country right now.
- Automation transfers speed, not accountability. The professional who uses a tool's output in a decision remains responsible for that decision.
- Historical bias in training data becomes algorithmic bias in outputs. This is not a glitch, it is a predictable consequence of training a model on records that reflect past human decisions.
- Feedback loops are a specific and serious risk in predictive policing: increased patrol generates more arrests, which confirms the prediction, which justifies more patrol.
- The same pattern seen in criminal justice AI, biased training data producing discriminatory outputs, appears in AI hiring tools, lending algorithms, and other high-stakes applications.
- Transparency and human override are not optional features. They are the minimum requirements for responsible AI deployment in any context where decisions affect individual rights.
- Non-technical managers can and should lead AI governance. The critical questions, about bias testing, accountability, transparency, and outcome tracking, do not require a data science background.
- Proactive documentation and public reporting, as demonstrated in the Mecklenburg County example, builds credibility and creates the evidence trail needed to detect and correct problems.
When the Algorithm Gets It Wrong: Accountability, Bias, and the Human Override
In 2023, the Detroit Police Department made headlines for the wrongful arrest of Robert Williams, a Black man detained based on a facial recognition match that was simply incorrect. The AI system flagged him as a suspect in a theft case. No human officer independently verified the match before an arrest warrant was issued. Williams spent 30 hours in custody before the error was discovered. Detroit's case became a landmark moment in the public debate about AI in policing, not because facial recognition failed, but because the human oversight layer failed completely. The system did what it was designed to do. The problem was that nobody checked its work.
Detroit's story reveals the central tension in public safety AI: speed versus scrutiny. Predictive tools and facial recognition systems can process data in seconds, and that speed creates pressure to act on results without pausing to question them. Officers get a match, a score, a flag, and the cognitive weight of 'the computer said so' can override standard verification instincts. This is what researchers call automation bias: the tendency to over-trust algorithmic outputs and under-apply human judgment. In low-stakes contexts, automation bias is an inconvenience. In criminal justice, it can destroy lives.
The principle extracted from Detroit isn't 'don't use AI.' It's 'never remove the human from the decision loop.' Every AI output in a public safety context, a risk score, a match result, a predicted hotspot, should be treated as a starting point for human investigation, not a conclusion. The agencies getting this right have built formal review protocols around their AI tools. The ones getting it wrong have let the tool become the decision-maker by default.
What 'Human in the Loop' Actually Means
Bias Isn't a Bug. It's a Data Problem
The COMPAS risk assessment tool, used in courtrooms across the United States to predict reoffending, became the subject of a landmark 2016 investigation by ProPublica. Journalists analyzed thousands of cases and found that the algorithm incorrectly flagged Black defendants as future criminals at nearly twice the rate of white defendants. The company behind COMPAS disputed the methodology, but the core finding stuck: when AI systems are trained on historical criminal justice data, they absorb the biases embedded in that data. Decades of racially disproportionate policing, charging, and sentencing become the training signal. The algorithm learns to replicate those patterns, and then presents them as objective predictions.
This isn't unique to criminal justice. In healthcare, AI diagnostic tools trained predominantly on data from white patients have shown reduced accuracy for patients of color. In hiring, AI screening tools trained on historical hire data have penalized applications from women when past hiring was male-dominated. The pattern is consistent: biased inputs produce biased outputs, regardless of how sophisticated the model is. For public sector professionals, the practical implication is clear, before any AI tool is adopted, ask who the training data represents and whether the outcomes have been audited for disparate impact.
Santa Cruz, California, made history in 2020 by becoming the first U.S. city to ban predictive policing software outright. City council members cited the ProPublica findings and similar research as evidence that the tools couldn't be made fair with the data available. Other jurisdictions, including New York City, have taken a different path, passing laws requiring algorithmic audits and public transparency reports for any AI tool used in government decisions. Neither approach is universally right. But both represent something essential: elected officials and community members asserting democratic oversight over AI systems that affect civil liberties.
Prompt
I'm a policy advisor briefing a city council member who is considering approving a contract for a predictive policing AI tool. She wants a plain-language briefing note, no jargon, covering: what the tool claims to do, the main documented risks around bias and wrongful outcomes, what oversight mechanisms other cities have required, and three questions she should ask the vendor before voting. Keep it under 400 words.
AI Response
**Briefing Note: Predictive Policing AI. Key Considerations** **What the tool claims to do** Predictive policing software analyzes historical crime data, incident reports, arrest records, time and location patterns, to forecast where crimes are likely to occur or identify individuals considered at elevated risk of offending. Vendors market these tools as a way to allocate police resources more efficiently. **Documented risks** The most significant concern is bias amplification. These tools are trained on historical policing data, which reflects decades of racially disproportionate enforcement. A 2016 ProPublica investigation of the COMPAS tool found Black defendants were flagged as high-risk at nearly twice the rate of white defendants with similar profiles. When officers then patrol predicted areas more heavily, they generate more arrests there, which feeds back into the training data, reinforcing the original bias in a self-fulfilling cycle. A second risk is wrongful action based on unverified AI outputs. In Detroit, a man was wrongfully arrested after officers acted on a facial recognition match without independent verification. Over-reliance on algorithmic outputs, sometimes called automation bias, is a documented pattern in law enforcement contexts. **What other cities have required** New York City passed Local Law 49 requiring annual bias audits of algorithmic tools used in city decisions, with public reporting. San Francisco and Santa Cruz have banned certain AI policing tools entirely pending evidence of fairness. Several jurisdictions now require community input sessions before procurement contracts are signed. **Three questions to ask the vendor** 1. What data was used to train this tool, and has it been independently audited for racial and demographic disparities in outcomes? 2. What human review process is built into the workflow, specifically, what must an officer do before acting on a prediction? 3. Can you provide outcome data from other jurisdictions showing whether deployment changed crime rates, and whether disparate impact complaints increased or decreased?
The Transparency Problem in Practice
In 2019, a defendant in Wisconsin named Eric Loomis challenged his sentence in court, arguing that the judge had used a COMPAS risk score as part of the sentencing decision, and that he had no right to see how the score was calculated because the algorithm was proprietary. The Wisconsin Supreme Court ruled against him. His sentence stood. The case exposed a fundamental tension that has no clean resolution yet: AI tools used in government decisions are often owned by private companies that protect their models as trade secrets. Public accountability and private intellectual property are on a collision course.
This matters for public sector professionals beyond just the courts. Any time a government agency contracts with a private AI vendor, for hiring decisions, benefits eligibility, parole recommendations, or school discipline systems, the same tension applies. The public has a right to understand decisions that affect their lives. Proprietary AI makes that understanding structurally difficult. Advocacy organizations like the Algorithmic Justice League and the AI Now Institute have pushed for 'algorithmic impact assessments' as a standard procurement requirement, similar to environmental impact assessments for construction projects. A small number of jurisdictions have begun adopting this approach.
| City / Jurisdiction | AI Tool Type | Approach Taken | Outcome |
|---|---|---|---|
| Santa Cruz, CA | Predictive policing | Full ban (2020) | First U.S. city to prohibit the technology outright |
| New York City | Multiple algorithmic tools | Mandatory annual bias audits + public reporting | Local Law 49 passed; audits now required across agencies |
| Detroit, MI | Facial recognition | Continued use after policy reform | New policy requires human corroboration before any arrest action |
| Wisconsin (state courts) | COMPAS risk scoring | Permitted with judicial discretion | Loomis v. Wisconsin upheld use; transparency debate ongoing |
| Amsterdam, Netherlands | Social welfare fraud detection | Tool suspended after audit | Algorithm found to disproportionately flag minority households |
Use AI to Prepare Better Questions. Not to Make the Decision
What Responsible Adoption Actually Looks Like
Responsible AI adoption in public safety isn't about being anti-technology. It's about being a disciplined buyer and an accountable operator. Agencies that are getting this right share a few consistent practices. They conduct pilot programs with defined success metrics before full deployment. They require vendors to provide outcome data from comparable jurisdictions. They establish clear policies on what AI outputs can and cannot trigger, a risk score alone cannot authorize an arrest; a facial recognition match alone cannot authorize a warrant. They train frontline staff not just on how to use the tool, but on how to push back on it.
Community engagement is also non-negotiable for responsible deployment. The populations most affected by predictive policing and risk assessment tools, predominantly lower-income communities and communities of color, are rarely consulted before procurement decisions are made. Participatory design processes, community advisory boards, and public comment periods before AI contracts are signed are increasingly being recognized as minimum standards, not optional extras. The Los Angeles Police Department faced sustained public opposition to its Operation LASER predictive policing program; it was eventually discontinued in 2020 after community organizing and a critical inspector general report.
For the non-technical professional working in or adjacent to public safety, whether you're a city manager, a school administrator overseeing security systems, a nonprofit policy analyzt, or a communications officer, your role in this ecosystem matters. You don't need to understand the algorithm. You need to understand the accountability questions: Who is affected? Who approved this? What happens when it's wrong? How do we know if it's working fairly? Those questions don't require a data science degree. They require professional responsibility and the willingness to ask them out loud.
Goal: Create a practical one-page checklist a public sector manager or policy advisor could use before approving or renewing a contract for any AI tool used in public safety, hiring, or benefits decisions.
1. Open ChatGPT (free at chat.openai.com) or Claude (free at claude.ai), no account required for basic use. 2. Type this prompt: 'I work in local government. Help me create a one-page checklist of accountability questions a public sector manager should ask before approving a contract for an AI tool used in public safety or government services. Focus on bias risks, human oversight, transparency, and community impact. Plain language, no jargon.' 3. Review the output. Identify any questions that feel generic, highlight them. 4. Follow up with: 'Make question 3 and question 7 more specific to criminal justice or policing contexts.' 5. Ask the AI to add a section specifically about vendor transparency: 'Add three questions specifically about what data the vendor used to train the model and how they handle audits.' 6. Copy the final checklist into a Word document or Google Doc and add your agency's name and today's date. 7. Share the draft with one colleague and ask them to add or remove one question based on your specific context. 8. Save the final version as 'AI Vendor Accountability Checklist, [Your Agency], [Date]' and keep it in your procurement folder. 9. Note which questions you found hardest to answer, those gaps are your highest-priority areas for follow-up with any vendor.
- Automation bias is a documented risk in law enforcement: officers tend to over-trust AI outputs and under-apply independent verification, with serious consequences for wrongful arrests and unjust sentencing.
- AI tools trained on historical criminal justice data replicate the biases embedded in that data, racially disproportionate policing patterns become algorithmic predictions presented as objective scores.
- The COMPAS case and the Robert Williams wrongful arrest are not edge cases, they are documented, peer-reviewed, and legally contested examples that define the current state of the field.
- Proprietary AI in government creates a structural accountability gap: when algorithms are trade secrets, defendants, citizens, and even judges cannot scrutinize the basis for decisions affecting civil liberties.
- Jurisdictions are responding differently, from full bans to mandatory audits to human-override requirements, and there is no single universally adopted standard yet.
- Non-technical public sector professionals have a concrete role: asking accountability questions at procurement, demanding bias audits, and ensuring community input before deployment.
- AI tools are legitimately useful for drafting policy frameworks, summarizing research, and preparing sharper vendor questions, the professional's job is to use AI to strengthen human judgment, not outsource it.
Takeaways
- Never treat an AI output in criminal justice as a final decision, it is always a starting point for human verification.
- Bias in AI tools is not a glitch; it is a predictable consequence of training on historically biased data. Ask vendors directly how they have tested and addressed it.
- Transparency is a democratic requirement, not a technical nicety. Proprietary algorithms used in government decisions create accountability gaps that affect civil liberties.
- Your value as a non-technical professional is in asking the right accountability questions, at procurement, at deployment, and at every contract renewal.
- Use AI tools like ChatGPT or Claude to help you prepare better frameworks, checklists, and briefing notes, but keep the judgment calls firmly in human hands.
This lesson requires Pro
Upgrade your plan to unlock this lesson and all other Pro content on the platform.
You're currently on the Free plan.
