Giving examples: show don't just tell
~25 min readGiving Examples: Show Don't Just Tell
Most professionals assume that writing better prompts means writing longer prompts — more detail, more adjectives, more context crammed into a single block of text. That instinct is understandable but wrong. The single highest-impact technique for getting consistently useful output from ChatGPT, Claude, or Gemini isn't length or precision of language. It's examples. Specifically, it's the practice of showing the model what good looks like before asking it to produce anything. This technique has a formal name in AI research — few-shot prompting — but the underlying logic is the same as any good brief you've ever written for a designer, a copywriter, or a new hire: words describe, examples demonstrate. The gap between those two things is where most AI outputs fall short.
Three Beliefs That Are Holding Your Prompts Back
Before getting into what works, it helps to name what doesn't — and more importantly, why. Three specific misconceptions about using examples in prompts show up constantly among professionals who are new to AI tools. Each one leads to a predictable failure mode: outputs that feel generic, miss the tone, or require three rounds of back-and-forth to get right. Identifying these beliefs isn't an academic exercise. Every time you act on a false model of how these systems work, you're paying a tax in time and frustration. The good news is that each misconception has a clean fix, and once you internalize the corrected mental model, your prompts start working the first time far more often.
Myth 1: Detailed Instructions Are Enough
The first myth is that if you describe what you want precisely enough, examples become unnecessary. This belief makes intuitive sense — after all, instructions are how humans communicate requirements to each other. If you tell a colleague to write a 'concise, punchy, data-driven LinkedIn post with a conversational tone and no buzzwords,' you'd expect them to produce something reasonable. The problem is that every word in that instruction — concise, punchy, conversational — means something slightly different to every person, and to every AI model. Claude's definition of 'punchy' is not your definition. ChatGPT's interpretation of 'conversational' may trend toward casual in a way that reads as unprofessional in your industry. Descriptions are ambiguous by nature. Examples are not.
This matters because large language models are pattern-matching machines trained on billions of documents. When you give ChatGPT the word 'concise,' it draws on every piece of text in its training data labeled or associated with that concept — which spans academic papers, marketing copy, legal briefs, and Reddit threads. The model has no idea which of those registers you actually want. An example cuts through all of that ambiguity instantly. When you show the model a LinkedIn post you've written before and say 'write something in this style,' you've just transmitted tone, vocabulary level, sentence rhythm, paragraph length, and emotional register simultaneously — in a way that 200 words of instruction could never fully replicate. Instructions set the boundary. Examples define the target.
A marketing manager at a mid-size SaaS company recently ran an informal test on this. She prompted ChatGPT-4o with detailed style instructions for a product announcement email — seven sentences of precise guidance covering tone, length, structure, and vocabulary. The output was competent but generic. She then ran the same request with a single example email she'd previously written and minimal instructions. The second output matched her brand voice closely enough to use with light editing. She saved roughly 25 minutes of revision. That's a small data point, but it reflects a pattern that holds consistently: one well-chosen example outperforms one paragraph of description, almost every time.
Instructions Describe. Examples Transmit.
Myth 2: You Need Many Examples to See a Benefit
The second misconception is the opposite problem: professionals who do know about few-shot prompting assume they need to provide five, ten, or more examples before the technique starts working. This belief often comes from reading AI research papers, where few-shot experiments frequently use larger example sets. In practice, for the kinds of tasks professionals do every day — drafting emails, summarizing documents, writing reports, generating social content — a single example (called one-shot prompting) produces dramatically better results than zero examples. Two or three examples are usually sufficient for complex or highly stylized tasks. Beyond that, you're often adding noise rather than signal, especially if your examples aren't perfectly consistent with each other.
The research on this is fairly clear. A 2020 paper from OpenAI introducing GPT-3 showed that model performance on many tasks improved sharply from zero-shot to one-shot, then continued improving more gradually with additional examples. For generative tasks like writing, the curve flattens quickly. What matters more than quantity is quality and relevance of examples. One example that closely matches the target task — same format, similar content domain, correct tone — beats five loosely related examples every time. This is a practically important point because collecting good examples takes effort, and knowing that one strong example is sufficient removes a significant barrier to actually using this technique.
The implication for your workflow is straightforward: start building a small personal library of examples you can drop into prompts. This doesn't need to be elaborate. A folder with five to ten pieces of writing you're proud of — a clear client email, a well-structured executive summary, a compelling project proposal — gives you a ready supply of one-shot examples for the most common tasks you use AI for. Notion AI users can store these directly in their workspace alongside prompt templates. Claude and ChatGPT users can save them in a text file or a custom system prompt. The point is that the barrier to using examples is low. You need one. You probably wrote it last week.
Prompt
Here's an example of an executive summary I've written before: --- The Q3 customer survey reveals three critical gaps in our onboarding experience. Response time expectations are unmet for 61% of new users. In-app guidance is rated 'insufficient' by 44%. Support ticket volume in the first 30 days is 2.3x higher than the industry benchmark. Addressing these gaps through targeted onboarding improvements could reduce first-month churn by an estimated 18%. --- Now write an executive summary of the following data in the same style: [Paste your data or report here]
AI Response
The Q4 NPS analysis identifies two compounding issues in our enterprise segment. Onboarding completion rates have dropped 14 points since the August product update. Feature adoption among accounts onboarded post-update is 37% lower than the prior cohort. Support escalations from enterprise accounts now represent 58% of total volume despite being 22% of the user base. Resolving the onboarding regression in this segment is the highest-leverage action available to reverse the NPS decline before the annual renewal cycle.
Myth 3: Examples Only Help with Creative or Stylistic Tasks
The third myth is the most limiting: that examples are a trick for getting AI to match a writing style, and therefore only relevant when tone or creativity is involved. This leads analysts, operations managers, and finance professionals to skip the technique entirely, assuming it doesn't apply to their structured, data-driven work. In reality, examples are just as powerful — sometimes more so — for analytical and structured tasks. When you show Claude an example of how you want data classified, a report section formatted, or a decision framework structured, you're transmitting structural logic that is genuinely difficult to convey through instructions alone. 'Categorize these customer complaints by theme' produces a different schema every time. 'Categorize these customer complaints using the same structure as this example' produces a consistent, reusable output.
Consider how consultants use this in practice. A strategy analyst using ChatGPT-4o to process interview notes can include an example of a previously coded transcript — showing exactly how themes were labeled, how sub-themes nested under main categories, and how ambiguous quotes were flagged. The model replicates that structure across new transcripts with high consistency. Without the example, the analyst gets a different categorization scheme each time, requiring manual normalization before the data is usable. The same principle applies to financial analysts building recurring reports in Gemini Advanced, product managers writing PRD sections in Claude, and operations teams classifying support tickets at scale. Examples don't just improve style. They encode structure, logic, and decision rules.
| Common Belief | What's Actually True | The Practical Fix |
|---|---|---|
| Detailed instructions are sufficient to control output quality | Instructions are inherently ambiguous; different models interpret the same words differently | Add one example that shows the target output — it transmits tone, structure, and style simultaneously |
| You need many examples (5+) before the technique works | One well-chosen example (one-shot) produces significant improvement; 2-3 is sufficient for complex tasks | Start with a single strong example from your own past work; build a small personal library |
| Examples only help with creative or stylistic writing tasks | Examples are equally powerful for structured tasks: classification, formatting, analysis, and reporting | Use examples to encode structure and decision logic, not just tone — especially for recurring analytical work |
| Longer prompts with more detail always produce better outputs | Prompt length matters far less than prompt structure; an example is more information-dense than a paragraph of description | Replace descriptive paragraphs with a single concrete example whenever possible |
| AI models 'remember' your preferences over time within a tool | Each conversation starts fresh unless you're using a custom system prompt or memory feature; examples must be re-supplied | Store your best examples in a reusable template so including them requires minimal effort |
What Actually Works: The Mechanics of Effective Examples
Now that the misconceptions are out of the way, here's the practical model. An effective example in a prompt does three things simultaneously: it shows the model the format you want, signals the vocabulary and register appropriate for your audience, and demonstrates the level of specificity you expect in the output. The best examples are real outputs you've produced yourself — not hypotheticals you've written for the purpose of the prompt. There's a meaningful difference. When you write a fake example to guide an AI, you're approximating what you want. When you paste in something you've actually written and used, you're showing the model the real target. That gap matters more than it sounds, particularly for tone.
The placement of your example within the prompt follows a consistent pattern that works well across ChatGPT, Claude, and Gemini. State the task briefly first — one sentence is fine. Provide the example, clearly delimited so the model knows where it starts and ends (triple dashes, XML tags like `<example>`, or simply a labeled header all work). Then give the model the actual input you want processed, using the same delimiter style. This structure — task, example, input — is more reliable than embedding the example mid-instruction or appending it at the end. It mirrors the natural cognitive sequence: here's what we're doing, here's what good looks like, here's your material. Models trained on human-generated text respond well to this logical flow because it resembles how humans brief each other.
One refinement that separates intermediate from advanced prompt writers: annotating your example. Instead of just pasting your example and moving on, briefly label what the model should pay attention to. 'Notice the example uses short paragraphs, starts with the key finding rather than context, and includes a specific number in the first sentence' takes ten seconds to add and substantially improves the model's ability to replicate the features that actually matter to you. This is especially valuable when your example is long and the critical features are subtle. Claude in particular responds well to annotated examples because of how it processes instruction-following tasks — it actively weighs labeled guidance against pattern inference from the example itself, and explicit labels resolve conflicts between the two.
The Three-Part Example Structure
Goal: Produce one AI output that matches your real working style closely enough to use with minimal editing, and create a reusable prompt template for that task.
1. Identify one writing or analysis task you do regularly — a weekly status update, a client-facing summary, a data classification exercise, or a recurring report section. 2. Find a real example of that output that you're genuinely happy with — something you've written and used, not a draft. Copy the full text. 3. Open ChatGPT, Claude, or Gemini (whichever you use most) and start a new conversation. 4. Write a one-sentence task description at the top of your prompt. Be specific about the output type: 'Write a project status update for a senior stakeholder' not just 'Write a status update.' 5. Paste your example below the task description. Add a clear label — 'EXAMPLE:' — and separate it from the rest of the prompt with triple dashes (---). 6. Add 2-3 annotation notes directly after the example label, identifying the specific features you want replicated: sentence length, structure, how data is presented, vocabulary level, or tone markers. 7. Paste your actual current input below a second set of triple dashes with the label 'NOW PROCESS THIS:' and submit the prompt. 8. Evaluate the output against your example. Note which features were replicated accurately and which were missed — this tells you what to annotate more explicitly next time. 9. Save your prompt template (task + example + annotations + input placeholder) in a doc or Notion page so you can reuse it without rebuilding from scratch.
Frequently Asked Questions
- Can I use someone else's writing as an example, or does it need to be my own? You can use any writing that represents the style or structure you want — a competitor's press release, a published article, a colleague's email. The model doesn't verify authorship. That said, your own past work tends to produce better results because it reflects your specific voice and standards, not a general approximation of the style you're targeting.
- How long should my example be? Match the length of your example to the length of the output you want. If you're asking for a 150-word email, your example should be a 150-word email. Using a 500-word example when you want a short output often causes the model to produce something longer than you intended — it anchors on the example's length as much as its style.
- What if I don't have a good example of the output I want? Write one — it takes 10 minutes and you'll use it repeatedly. Alternatively, prompt the model to generate three candidate examples based on your description, pick the one closest to your intent, edit it into your ideal version, then use that edited output as your example in a second prompt.
- Does this technique work in Perplexity AI or Notion AI, or only in ChatGPT and Claude? The technique works in any tool that accepts freeform text prompts, including Perplexity, Notion AI, and GitHub Copilot Chat. The underlying models in these tools (often GPT-4 or Claude variants) respond to examples the same way. The main constraint is prompt length limits, which are generous in all current tools — typically 32,000 tokens or more.
- Will including a long example use up my context window and cause the model to lose track of my instructions? In current models — GPT-4o, Claude 3.5 Sonnet, Gemini 1.5 Pro — context windows are large enough (128,000 to 1 million tokens) that a typical example causes no meaningful degradation. For very long documents, keep your example focused on the relevant section or format rather than pasting an entire 10-page report.
- Should I use different examples for different AI tools, or is one example good across all of them? One well-chosen example works well across all major tools. Subtle differences exist — Claude tends to follow structural examples more precisely, while ChatGPT sometimes injects more variation — but these differences are small compared to the improvement you get from using any example versus none.
Key Takeaways from Part 1
- Detailed instructions leave room for interpretation; examples transmit tone, structure, and vocabulary simultaneously and eliminate ambiguity in ways that description alone cannot.
- One well-chosen example (one-shot prompting) produces significant improvement over zero examples — you don't need five or ten. Quality and relevance of the example matter far more than quantity.
- Examples are as valuable for structured analytical tasks — classification, formatting, report writing — as they are for creative or stylistic work. The technique encodes logic and structure, not just tone.
- Use the three-part prompt structure consistently: task statement, delimited example with annotations, then the actual input. This pattern works reliably across ChatGPT, Claude, and Gemini.
- Annotating your example — briefly labeling the specific features you want replicated — meaningfully improves output quality, especially when the critical features are subtle or the example is long.
- Build a small personal library of real examples from your own past work. Five to ten strong examples covering your most common AI tasks eliminates the main friction point in using this technique consistently.
Three Things Most People Get Wrong About Giving Examples
You've seen how examples transform vague prompts into precise instructions. But before you start loading every prompt with samples, there are three widespread beliefs about example-based prompting that will quietly sabotage your results. These aren't edge cases — they're the patterns that trip up experienced professionals who assume their intuition about 'good examples' transfers directly from human communication to AI systems. It doesn't, at least not completely. The mechanics of how models like GPT-4 and Claude process your examples are different enough from human learning that some very reasonable assumptions turn out to be wrong. Correcting these three misconceptions will immediately improve your hit rate and save you the frustrating experience of wondering why a 'perfectly clear' prompt keeps producing mediocre output.
Myth 1: More Examples Always Produce Better Results
The logic seems airtight: if one example helps the model understand what you want, three examples must help more, and six must be better still. This is how human training often works — repeated exposure to varied samples helps a student internalize a pattern. So professionals building serious prompts frequently stack five, six, even eight examples, confident they're being thorough. The problem is that beyond a certain threshold, additional examples stop teaching the model new information about your target format and start introducing noise. Each example is also consuming tokens — roughly 750 words per 1,000 tokens — which eats into the space available for actual output and, in longer prompts, can push earlier context further from the model's active attention window.
Research published by researchers at Google and Anthropic consistently shows diminishing returns after three to five examples for most formatting and style tasks. The model extracts the underlying pattern quickly. What additional examples sometimes do is confuse the model about which specific features of your examples are essential versus incidental. Suppose you want bullet points written in a dry, factual tone and all six of your examples happen to start with a number. The model may start treating that number-first structure as a required feature rather than a coincidence — producing outputs that technically match your examples but miss the actual goal. This is called spurious pattern extraction, and more examples make it more likely, not less.
The practical rule is to use two to four examples for most tasks and invest the saved token budget in clearer task instructions. If you're working with Claude's 200K context window or GPT-4's 128K window, token economy matters less for raw length — but clarity still does. Three tightly chosen examples that each demonstrate a distinct feature of your desired output (tone, structure, depth, vocabulary) outperform six loosely chosen examples that mostly repeat the same signals. When you do need more examples — for highly specialized formats like legal summaries or technical specifications — add them one at a time and test whether each addition actually changes the output quality.
More Examples ≠ More Clarity
Myth 2: Your Examples Should Be Perfect
When professionals sit down to write examples for a prompt, they instinctively reach for their best work — polished, complete, flawless samples that represent exactly what they'd want the final output to look like. This feels right. You're showing the model the gold standard. The problem is that 'perfect' examples often hide the reasoning behind their quality. A beautifully structured executive summary doesn't reveal why certain information was prioritized, why the language shifted from technical to plain at a specific point, or how length decisions were made. The model sees the output but not the logic. Without explicit guidance, it reverse-engineers rules from surface features, which produces outputs that look similar but lack the underlying judgment.
What actually works better is annotated or purposefully varied examples — samples that make the decision-making visible. You can do this by including a brief note alongside an example ('Notice how the third sentence pivots to business impact — always do this'), or by showing a before/after pair where the 'before' is a reasonable but inadequate version and the 'after' demonstrates the specific improvement you care about. Before/after pairs are particularly powerful because they force the model to understand what distinguishes adequate from excellent in your specific context. Claude and GPT-4 both handle inline annotations well; you don't need a special format, just parenthetical notes within or after the example.
There's also a counterintuitive case for deliberately imperfect examples: when you want the model to fill gaps or exercise judgment, showing examples that leave something slightly unresolved can produce more thoughtful outputs than showing examples that over-specify every detail. This is advanced territory, but the underlying principle is sound — your examples set the ceiling for the model's ambition. If every example is rigidly complete, the model optimizes for completion over quality. Leaving deliberate space in your examples signals that judgment and originality are expected, not just pattern-matching.
Prompt
Write a product feature description for our new calendar integration. Use this before/after pair as your guide: BEFORE (too technical, buries the benefit): 'The system implements bi-directional API sync with OAuth 2.0 authentication across Google Calendar, Outlook, and Apple Calendar platforms, with conflict resolution logic running at 5-minute intervals.' AFTER (lead with benefit, technical detail secondary): 'Your meetings sync automatically across Google Calendar, Outlook, and Apple Calendar — no manual updates, no double-bookings. The system checks for conflicts every five minutes and resolves them silently in the background.' Key move: benefit in the first sentence, technical reassurance in the second. Now write a description for our new Slack notification feature, which sends smart digests instead of individual pings.
AI Response
Stop getting buried in individual Slack pings. Our smart digest feature groups related notifications into a single, scannable summary — delivered when you actually have time to act on them, not the moment they're triggered. You control the frequency; the system handles the sorting.
Myth 3: Examples Replace the Need for Explicit Instructions
Part 1 established that examples outperform vague instructions. That's true. But a common overcorrection is to assume that good examples make explicit instructions unnecessary — that you can simply show the model what you want and skip the explanation entirely. This works for simple, well-defined tasks. For anything with conditional logic, audience-specific requirements, or constraints that your examples don't visibly demonstrate, relying on examples alone creates a fragile prompt. The model will faithfully reproduce the patterns it sees in your examples, but it won't infer rules that weren't demonstrated. If none of your examples happen to show what to do when data is missing, or when the subject matter is sensitive, the model will improvise — and not always in the direction you'd choose.
The most reliable prompts combine both: examples that demonstrate the target pattern and explicit instructions that cover the edge cases, constraints, and decision rules the examples can't show. Think of examples as the 'what' and instructions as the 'why and when.' A prompt that says 'Write in this style [example] — but if the client is in the healthcare sector, avoid all clinical statistics unless I've provided them' is far more robust than an example-only prompt, because the conditional rule can't be inferred from any example you'd reasonably include. Claude and GPT-4 handle this combination cleanly; you don't need to choose between showing and telling.
| Common Belief | The Reality | What To Do Instead |
|---|---|---|
| More examples = better output | Diminishing returns after 3–4 examples; extras introduce noise | Use 2–4 varied examples; invest saved tokens in clearer instructions |
| Examples should be polished and perfect | Perfect examples hide decision logic; models learn surface patterns | Use annotated examples or before/after pairs that expose your reasoning |
| Good examples replace explicit instructions | Examples can't demonstrate conditional rules or edge cases | Combine examples (the 'what') with explicit instructions (the 'when and why') |
| Any example in the right category works | Irrelevant examples mislead more than no examples at all | Choose examples that match the specific task, audience, and format you need |
| Examples from other sources work as well as your own | Examples from mismatched contexts import unwanted stylistic patterns | Use your own past work or purpose-written examples calibrated to your context |
What Actually Works: Building Prompts That Teach
Now that the misconceptions are cleared, here's the positive case — the specific practices that make example-based prompting genuinely powerful. The first principle is example diversity over example quantity. Each example you include should demonstrate a distinct dimension of your desired output. If you want the model to write concise executive summaries, your first example might demonstrate the right length, your second the right level of technical vocabulary, and your third how to handle a situation where the data is ambiguous. Three examples, three lessons. This approach forces you to be explicit with yourself about what you actually care about in the output — which itself is clarifying, regardless of what it does for the model.
The second principle is to position your examples strategically within the prompt. Models like GPT-4 and Claude process prompts sequentially, and the position of information affects how much weight it receives. For short prompts under 1,000 tokens, position matters less. For longer prompts, place your most important example closest to the actual task instruction — not buried in the middle of a long setup. A common structure that performs well is: task description → constraints → examples → specific request. This mirrors how you'd brief a skilled freelancer: tell them what the job is, set the guardrails, show them a reference, then give them the actual assignment. Frontloading examples before the task instruction slightly reduces their effectiveness because the model hasn't yet been primed on what the example is meant to teach.
The third principle is to test your examples in isolation before embedding them in a complex prompt. Paste a single example into ChatGPT or Claude and ask: 'What rules would you infer from this sample about the format, tone, and length I want?' The model's answer tells you exactly what your example is teaching — and often surfaces patterns you didn't intend to include. This takes two minutes and prevents the frustrating cycle of tweaking a complex prompt without knowing which element is causing the problem. If the model infers something wrong from your example in isolation, it will infer the same thing when the example is embedded in a longer prompt. Fix the example first, then build the full prompt around it.
The Two-Minute Example Audit
Goal: Create a reusable, three-example prompt template for a recurring work task, where each example teaches a distinct quality dimension and is validated through the isolation audit.
1. Choose a writing task you do regularly at work — a status update, client email, meeting summary, or similar recurring output. 2. Write down three specific qualities that distinguish your best versions of this output from average ones. These become your three dimensions (e.g., 'tight opening sentence,' 'no jargon,' 'ends with a clear next step'). 3. Find or write one example that clearly demonstrates dimension 1. Keep it to 80–120 words. 4. Find or write a second example that demonstrates dimension 2 — ideally one where dimension 1 is neutral or absent, so the model learns they're separate features. 5. Find or write a third example that demonstrates dimension 3, again keeping the other dimensions unremarkable. 6. Write a two-sentence annotation for each example explaining what it's teaching. Use plain language: 'Notice how this example...' or 'This demonstrates...' 7. Assemble the three annotated examples into a prompt with a clear task instruction at the end. Test it on a real piece of work you currently need to produce. 8. Run the two-minute example audit: paste each example individually into the model and ask what rules it infers. Fix any mismatches before using the full prompt. 9. Save the prompt as a reusable template, noting which dimensions each example covers so you can swap examples as your needs evolve.
Frequently Asked Questions
- Can I use examples from the internet or published sources? Yes, but match the context carefully — an example from a B2B SaaS blog will import that world's vocabulary and assumptions into your output. If you're writing for a law firm, use examples from legal communications, not tech content.
- What if I don't have a good example of what I want? Write a rough draft yourself, even a mediocre one, and tell the model 'Here's a rough attempt — improve it while keeping [specific feature].' Your imperfect draft is more useful than no example at all.
- Do examples work the same way in system prompts versus user messages? In tools like the ChatGPT API and Claude's system prompt field, examples in the system prompt set a persistent baseline for the entire conversation; examples in user messages affect only that turn. For consistent style across a long session, put your core examples in the system prompt.
- How do I handle tasks where every output is different — like custom client proposals? Use structural examples rather than content examples. Show the model the shape of a strong proposal (section order, length ratios, tone shifts) without using content that could bleed into the new output.
- Will the model eventually 'forget' my examples in a long conversation? Yes. In conversations exceeding roughly 8,000–10,000 words of history, earlier context loses influence in most models. If you're working in a long session, re-paste your key examples before the task that needs them most.
- Is it better to label my examples explicitly, like 'EXAMPLE 1:'? Yes, for prompts with multiple examples. Explicit labels help the model distinguish examples from instructions, reducing the chance it treats your example text as a direct command rather than a reference pattern.
Key Takeaways From This Section
- Two to four examples outperform larger sets when each example demonstrates a distinct dimension of quality — quantity without variety adds noise, not signal.
- Perfect, polished examples hide decision logic. Annotated examples and before/after pairs expose the reasoning the model needs to generalize correctly.
- Examples and explicit instructions are complementary, not interchangeable. Examples show the pattern; instructions cover the conditional rules and edge cases no example can demonstrate.
- Position your most important example closest to the task instruction in longer prompts — don't bury it in setup text.
- Run the two-minute isolation audit on any new example before embedding it in a full prompt. What the model infers from the example alone is what it will infer inside the full prompt.
- For recurring tasks, build reusable example templates with labeled dimensions so you can swap individual examples as your needs change without rebuilding the whole prompt.
What You're Getting Wrong About Examples in Prompts
Most professionals who start using examples in their prompts make the same three mistakes. They believe more examples always produce better results. They think examples only matter for formatting and style. And they assume that if an example works once, it will work consistently across similar tasks. All three beliefs are understandable — and all three will quietly sabotage your prompting. Understanding where each goes wrong gives you a sharper mental model for one of the most powerful techniques in AI communication.
Myth 1: More Examples Always Means Better Output
The instinct to pile in examples feels logical. If two examples help, surely five will help more? Research on few-shot prompting tells a different story. Beyond three to five examples, quality plateaus — and in many cases, output quality actually degrades. GPT-4 and Claude both process examples as tokens, consuming context window space that could otherwise hold your actual instructions, constraints, or input data. A prompt stuffed with eight examples leaves less room for nuance in your real request.
There's a more subtle problem too. When you provide many examples, the model pattern-matches aggressively to those examples rather than reasoning flexibly about your actual goal. You end up with outputs that mimic the surface features of your examples — the sentence length, the vocabulary, the structural rhythm — rather than outputs that genuinely serve your purpose. Three well-chosen, diverse examples that cover edge cases outperform six repetitive examples every single time.
The sweet spot for most professional tasks is two to four examples. One example establishes the pattern. Two or three examples show the model that the pattern holds across variation. A fourth example is occasionally useful when you need to demonstrate how to handle an exception. Beyond that, you are almost certainly adding noise rather than signal, and you are paying for it in both token cost and output coherence.
The Over-Example Trap
Myth 2: Examples Only Shape Style and Format
Many professionals use examples purely to control surface-level output — the bullet point format, the paragraph length, the professional tone. That's valuable, but it captures maybe 30% of what examples can actually do. Examples also teach the model your reasoning logic, your decision criteria, and your implicit professional standards. When you show a model how you handle ambiguous customer feedback — which complaints get escalated, which get a templated reply — you are transmitting judgment, not just formatting.
This is where few-shot prompting becomes genuinely powerful for knowledge workers. An analyst who shows Claude two examples of how they classify market signals — one bullish, one bearish, each with brief reasoning — is training the model on their analytical framework, not just their preferred sentence structure. The model extracts the underlying logic and applies it to new cases. This is fundamentally different from giving instructions, because instructions require you to articulate rules explicitly, while examples allow implicit knowledge to transfer.
The implication is significant: your examples should be chosen to demonstrate decisions, not just appearances. Pick examples that show the model a moment where judgment was required and how you resolved it. Show it a borderline case alongside a clear case. That contrast does more teaching than five examples of the easy, obvious scenario.
Prompt
Classify each customer comment as Urgent, Standard, or Positive. Here are two examples: Comment: 'I've been waiting 3 weeks and nobody has responded to my emails.' → Urgent (explicit wait time + failed contact attempts = escalation risk) Comment: 'The onboarding took a bit longer than expected but the team was helpful.' → Standard (mild friction, no urgency signal, handled with acknowledgment) Now classify: 'Your product is great but the billing page is really confusing and I nearly cancelled.'
AI Response
Standard — leaning Urgent. The positive sentiment reduces churn risk slightly, but 'nearly cancelled' triggered by a UX issue signals a fixable problem with real retention stakes. Flag for product team and send a personal reply rather than a template.
Myth 3: A Good Example Works Universally
A prompt that works brilliantly on Tuesday with ChatGPT-4o does not automatically transfer to Claude 3.5 Sonnet, to a different task type, or even to the same task with a different input dataset. Examples are context-sensitive. They teach the model about the specific domain, tone, and decision space you demonstrated — and when any of those variables shift, the example's effectiveness shifts too. Professionals who build a library of 'proven prompts' and apply them mechanically across different contexts are often puzzled when results deteriorate.
The better mental model is to treat examples as calibration tools rather than permanent fixtures. When you start a new task type, draft two fresh examples that are native to that context. When you switch models — say from Gemini to Claude for a specific workflow — test your examples explicitly, because different models have different sensitivities to example structure and length. Your examples are not a one-time investment; they are a working component of a prompt that gets maintained.
| Common Belief | What's Actually True |
|---|---|
| More examples = better output | 2–4 targeted examples outperform 6+ repetitive ones |
| Examples only control format and style | Examples transfer reasoning logic and implicit judgment |
| A good example works across all contexts | Examples are context-specific and need calibration per task |
| Examples replace clear instructions | Examples and instructions work together — neither replaces the other |
| Any example is better than no example | A poor or contradictory example actively misleads the model |
What Actually Works: Best Practices for Example-Driven Prompting
The most effective approach pairs one sentence of explicit instruction with two to three examples that demonstrate judgment at the edges. Start with a clear statement of what you want — the task, the output format, the audience — then use examples to show the model how to handle the cases where the instruction alone would be ambiguous. This combination outperforms either approach used alone. Instructions give the model the map; examples give it the terrain.
Choose examples that are genuinely diverse. If you are showing the model how to write executive summaries, don't use three summaries of quarterly financial reports. Use one financial report, one product launch update, and one risk briefing. That diversity teaches the model the underlying structure of a good executive summary rather than the surface features of a financial document. Diversity in your examples is what creates flexibility and robustness in the output.
Label your examples explicitly. Instead of just showing input and output pairs, add a brief annotation — one sentence explaining why the output looks the way it does. 'This response is short because the original question was factual, not exploratory.' That annotation makes the model's pattern-extraction more accurate and gives you a natural place to encode the professional standards that live in your head but rarely make it into formal documentation.
Build a Personal Example Library
Goal: Produce a reusable prompt template containing one instruction and two annotated examples, tested against a real input, ready to use in your daily workflow.
1. Choose one task you do repeatedly — classifying emails, summarizing meeting notes, drafting client updates, or similar. 2. Write a one-sentence instruction that states the task, the desired output format, and the intended reader. 3. Find or write a real example of an input you have handled well — something where your judgment was clear. 4. Write the ideal output for that example, exactly as you would want the model to produce it. 5. Add a one-sentence annotation below the output explaining the key decision or standard it demonstrates. 6. Repeat steps 3–5 for a second example that covers a meaningfully different scenario or edge case. 7. Combine the instruction and both annotated examples into a single prompt template. 8. Test the template in ChatGPT or Claude using a new, real input from your current work. 9. Save the template and the test output to your personal example library — this is your working asset.
Frequently Asked Questions
- Can I use bad examples to show the model what NOT to do? Yes, with caution — negative examples (showing an output to avoid) can work, but label them explicitly as 'bad example' or the model may inadvertently learn the wrong pattern.
- Do examples work differently in Claude vs. ChatGPT? Broadly similar, but Claude tends to follow example structure more literally, while GPT-4o is slightly more willing to adapt. Test your critical prompts on both if you use both tools.
- How long should my examples be? Match the length of your target output. If you want a three-sentence summary, your examples should each be three-sentence summaries — not paragraphs.
- Should examples come before or after my main instruction? Put your instruction first, then examples. The model reads top-to-bottom and benefits from knowing the goal before it processes the demonstrations.
- What if I can't find a real example to use? Write a synthetic one — fabricate a realistic input and the ideal output. Synthetic examples work just as well as real ones as long as they accurately represent the task.
- Do examples increase my API costs? Yes — every example adds tokens. At GPT-4o pricing (~$5 per million input tokens), two 100-word examples add roughly $0.001 per call, which is negligible for most professional use but worth tracking at scale.
Key Takeaways
- Two to four high-quality examples outperform larger sets — beyond four, you typically add noise, not accuracy.
- Examples transfer reasoning and judgment, not just style — choose examples that show how you handle ambiguous or borderline cases.
- Examples are context-specific; recalibrate them when switching tasks, domains, or AI models.
- Pair explicit instructions with examples — instructions define the goal, examples demonstrate the standard.
- Diverse examples create flexible outputs; repetitive examples create brittle, over-fitted outputs.
- Annotate your examples with a sentence of reasoning — it makes pattern-extraction more accurate and encodes your implicit professional standards.
- Maintain a personal example library tagged by task type to accelerate future prompting and preserve your best judgment.
A colleague provides eight examples in a prompt and finds the output feels mechanical and repetitive. What is the most likely cause?
You want to use examples to help Claude classify customer support tickets. Which approach will produce the most useful results?
You have a prompt template with two excellent examples that works perfectly in ChatGPT. You switch to Gemini for the same task and results are inconsistent. What should you do first?
Which of the following best describes what well-chosen examples communicate to an AI model?
A prompt reads: 'Summarize this report. Here are three examples, all from Q3 financial reports.' The model produces excellent summaries of financial reports but struggles when applied to a product launch briefing. What is the core problem?
Sign in to track your progress.
