███╗   ███╗  ██████╗  ██████╗  ██╗   ██╗ ██╗      ███████╗     ██████╗   ██╗
████╗ ████║ ██╔═══██╗ ██╔══██╗ ██║   ██║ ██║      ██╔════╝    ██╔═████╗ ███║
██╔████╔██║ ██║   ██║ ██║  ██║ ██║   ██║ ██║      █████╗      ██║██╔██║ ╚██║
██║╚██╔╝██║ ██║   ██║ ██║  ██║ ██║   ██║ ██║      ██╔══╝      ████╔╝██║  ██║
██║ ╚═╝ ██║ ╚██████╔╝ ██████╔╝ ╚██████╔╝ ███████╗ ███████╗    ╚██████╔╝  ██║
╚═╝     ╚═╝  ╚═════╝  ╚═════╝   ╚═════╝  ╚══════╝ ╚══════╝     ╚═════╝   ╚═╝
hook · pre-setup · beads 01–09 · v1
On this page
  1. Hook
  2. Pre-setup
  3. Beads
    1. 01 Configure how Claude talks to you
    2. 02 See the jagged edge
    3. 03 Ask Claude about Claude
    4. 04 Brief Claude like a colleague
    5. 05 Make it argue both sides
    6. 06 Long sessions: summarise and restart
    7. 07 Catch the cave
    8. 08 Ground the answer in sources
    9. 09 Know when to bail
░▒▓████████████████████████▓▒░ Hook / Layer 01 ░▒▓████████████████████████▓▒░

You use AI in a tab. You get plausible answers, paste them, and assume you got more done. You produce slop. You don't notice.

Nine exercises. Fifteen minutes each. You will produce work better than your own first draft, with sources you can defend. You will watch Claude lie, cave, and hand you confident mush. You will know how to catch each one. You walk away sharper, not duller.

░▒▓████████████████████████▓▒░ Pre-setup / Tools ░▒▓████████████████████████▓▒░

Reference material. Read once, return when needed.

Surfaces: Chat, Cowork, Code

Three modes. Pick by task.

ChatCoworkCode
What it does Conversation. You ask, it answers. Conversation plus file and folder access on your machine. Agentic coding. Reads, writes, runs code.
Use for Drafting, exploration, Q&A, the work in this module Multi-step tasks on your own files (memos, sheets, decks) Building stuff

All three live in the Claude desktop app. Chat also runs in the browser at claude.ai. Code also runs in the terminal.

For Module 1, use Chat. Browser or desktop app, your choice. Cowork and Code are later-module topics.

Settings

At the top of every new chat:

  • Set the model (see below). The choice locks for the conversation. You cannot switch mid-thread.
  • Toggle adaptive thinking on. The model decides how much to think based on the task.
  • Toggle web search on. The model fetches live sources to ground its answers. Beads 8 and 9 lean on this.

Settings live in three places: model and toggles at the top of the chat or in the input bar; standing instructions in Settings → "Instructions for Claude"; everything else is conversation-by-conversation.

Models

Anthropic ships three families.

HaikuSonnetOpus
Speed FastestMiddleSlowest
Cost LowestMiddleHighest
Context window ~200K tokens200K1M
Best for High-throughput automation, latency-sensitive Q&A Drafting, summaries, most everyday work Hard reasoning, exploration, judgment calls

When in doubt, use Opus. If it feels slow or the task does not need the extra punch, drop to Sonnet. Avoid Haiku for this module; it is built for throughput, not the work we are doing.

(Context window and capability shift between versions. Check claude.ai for current specs.)

Tokens

A token is a chunk of text the model processes. Roughly 0.75 of a word. Context windows and prices are measured in tokens. You will not count them. You will see them referenced. That is what they are.

Context rot

Long contexts are not used uniformly well. As a conversation grows, the model's grasp on older content degrades. Information in the middle of a long context gets weighted less (Liu et al, "Lost in the Middle," 2307.03172). Sycophancy from earlier turns sticks and shapes later answers (SycEval, 2502.08177).

Practical move when sessions run long: summarise the state, open a fresh chat, paste the summary as context. Bead 6 covers the working pattern; bead 9 covers when to bail to a clean chat entirely.

░▒▓████████████████████████▓▒░ The Stack / Beads 01–09 ░▒▓████████████████████████▓▒░
██████╗  ███████╗  █████╗  ██████╗      ██████╗   ██╗
██╔══██╗ ██╔════╝ ██╔══██╗ ██╔══██╗    ██╔═████╗ ███║
██████╔╝ █████╗   ███████║ ██║  ██║    ██║██╔██║ ╚██║
██╔══██╗ ██╔══╝   ██╔══██║ ██║  ██║    ████╔╝██║  ██║
██████╔╝ ███████╗ ██║  ██║ ██████╔╝    ╚██████╔╝  ██║
╚═════╝  ╚══════╝ ╚═╝  ╚═╝ ╚═════╝      ╚═════╝   ╚═╝

Configure how Claude talks to you

Claude flatters by default. Once you stop trusting the praise, you stop trusting the rest. The Instructions panel is where you fix that. One paragraph, saved once, applies to every chat after.

The standing instructions

Settings → "Instructions for Claude." Paste the template. Edit the orange-fill brackets.

I work at a [firm type] as a [role]. Most of what I send you
involves [research / decisions / writing / analysis]. Write
for a numerate, time-poor reader.

Don't flatter me. Don't say "great question." Don't soften
critique. If I am wrong, say so with reasoning. Default to
skepticism on my claims and prove them before agreeing.

Say "I don't know" rather than synthesise. Don't speculate
without flagging the speculation. When you cite, link the
source and quote the relevant line.

Default to prose, not bullets. No em-dashes. No preamble.

Exercise: feel the shift

  1. Before. Open a new chat in claude.ai. Opus, adaptive thinking on, web search on, instructions panel still empty. Paste:
    I think I am better at using AI than most of my colleagues.
    I get cleaner answers, catch the model's mistakes more often,
    and use it for real work. Stress-test where I am wrong.
    Read the response. Notice the tone. Save it.
  2. Configure. Open Settings → Instructions for Claude. Paste the template above. Edit the orange-fill brackets.
  3. After. Open a new chat. Paste the same stress-test prompt. Compare side by side.

Key Takeaway

The model is the same. The instructions changed how it talks to you. Same prompt, different colleague.

Why this works

Sycophancy is a persistent, measurable failure mode (SycEval, 2502.08177). System-level instructions reduce it; they do not eliminate it. Beads 7 and 9 cover what slips through.

██████╗  ███████╗  █████╗  ██████╗      ██████╗   ██████╗
██╔══██╗ ██╔════╝ ██╔══██╗ ██╔══██╗    ██╔═████╗ ╚════██╗
██████╔╝ █████╗   ███████║ ██║  ██║    ██║██╔██║  █████╔╝
██╔══██╗ ██╔══╝   ██╔══██║ ██║  ██║    ████╔╝██║ ██╔═══╝
██████╔╝ ███████╗ ██║  ██║ ██████╔╝    ╚██████╔╝ ███████╗
╚═════╝  ╚══════╝ ╚═╝  ╚═╝ ╚═════╝      ╚═════╝  ╚══════╝

See the jagged edge

The model is not uniformly smart. It is exceptional at some things and surprisingly bad at others, and the failures are not where you would expect them. A graduate-level reasoning question goes fine; a question a child would get right falls over. Same model, same conversation. The pattern is jagged intelligence: capability is high on average and unreliable in patches you cannot predict in advance. Until you have seen it firsthand on a model you trusted, you will trust the wrong answers without knowing it.

This bead is empirical. Run two questions across three models. Notice what you cannot predict.

The two questions

A simple one and a hard one.

Simple.

I need to wash my car. The car wash is 50 yards from my front
door. Gas is expensive but the weather is nice. Should I walk
or drive?

Hard.

A bond fund holds two positions, both currently valued at par:
- 100 units of bond A: 5-year duration, 4% YTM
- 200 units of bond B: 10-year duration, 5% YTM

Two simultaneous shocks:
- Rates rise by 100 bps across the curve
- Bond B's credit spread widens by 50 bps; bond A is unchanged

Estimate the percentage P&L impact on the portfolio. State
your assumptions, show the steps, and round to the nearest 0.1%.

Exercise: run both, on three models

Six runs total. Each model needs its own chat (you cannot switch mid-thread, see pre-setup). Adaptive thinking on, web search on, standing instructions in place from Bead 1.

  1. Simple, three models. New chat on Opus. Paste the simple question. Save the answer. Repeat in fresh chats with Sonnet and Haiku.
  2. Hard, three models. New chat on Opus. Paste the hard question. Save. Repeat with Sonnet and Haiku.
  3. Compare. Lay the six answers side by side. Three things to look for: where the simple question went wrong (solemn enumeration of factors, recommendation to drive "if you have errands", a both-sides hedge); whether the hard question stayed competent across all three models, even on Haiku; where the variance was higher than you predicted. If all three models answered the simple question correctly, the lesson still lands. You could not have predicted that without checking.

Key Takeaway

You cannot predict where the model will fail by predicting how hard the question is. You must check empirically, especially on the answers that look obvious. Easy questions are not the safe ones; they are the ones where you are most likely to skip checking.

Why this works

Capability in modern LLMs is uneven across task types in ways that surprise the model trainers (Karpathy, 2024, on jagged intelligence; published model cards consistently report wide variance across benchmarks for the same model). Smaller models within a family often do nearly as well as larger ones on hard reasoning while losing more on simple commonsense; the failure pattern is not a smooth function of question difficulty. Running the same prompt across three models is the cheapest available test of where the edge sits today.

██████╗  ███████╗  █████╗  ██████╗      ██████╗   ██████╗
██╔══██╗ ██╔════╝ ██╔══██╗ ██╔══██╗    ██╔═████╗ ╚════██╗
██████╔╝ █████╗   ███████║ ██║  ██║    ██║██╔██║  █████╔╝
██╔══██╗ ██╔══╝   ██╔══██║ ██║  ██║    ████╔╝██║  ╚═══██╗
██████╔╝ ███████╗ ██║  ██║ ██████╔╝    ╚██████╔╝ ██████╔╝
╚═════╝  ╚══════╝ ╚═╝  ╚═╝ ╚═════╝      ╚═════╝  ╚═════╝

Ask Claude about Claude

Most people learn the model's limits by getting burned. There is a faster way: ask the model. It knows roughly where its training data thins, what it tends to make up, and what an "I don't know" answer looks like in its own voice. It will tell you if you ask before it has anything to defend. The standing instructions from Bead 1 make that conversation more honest.

The calibration prompt

Save this. Run it at the top of any chat where the answer matters and you do not yet know the model's weak points on the topic.

You are about to help me with [the UK FCA's Sustainability Disclosure Requirements (SDR), launched in 2024, and how it differs from the EU's SFDR].

Five questions before we start. Answer all five.

1. What is your knowledge cutoff date? Be precise.
2. What about this topic is likely out of date?
3. Where does your training data on this topic come from?
   What is overrepresented and what is missing?
4. What kinds of mistakes do you tend to make on questions
   like this? Be specific. No reassurance.
5. If I ask you something you do not know, what does your
   answer tend to look like? How would I tell?

The follow-up question

Run this after the calibration response.

What are the three FCA SDR labels for sustainable funds, what
does each require, and how have asset managers responded since
the November 2024 entry-into-force?

Exercise: calibrate before you ask

  1. Run the calibration. Paste the prompt as written, or swap the topic for one of your own. Read the answers slowly.
  2. Save the response. Note especially points four and five. You will return to them.
  3. Ask the real question. In the same chat, paste the follow-up. Read the answer through the lens of points four and five. Where is the model confident? Where is it hedging in the way it warned you about? Where would you have missed the hedge if you had not just read the calibration?

Key Takeaway

Calibrate before you ask. The model is more honest about what it does not know before it has produced an answer it has to defend.

Why this works

Self-reported uncertainty in language models is imperfect but informative; calibration improves when models reason about their own confidence before producing the answer (Lin et al, "Teaching Models to Express Their Uncertainty in Words," 2205.14334). Asking up front means the calibration is not contaminated by the answer the model is about to commit to. The same logic that makes the standing instructions work, applied per topic instead of per identity.

██████╗  ███████╗  █████╗  ██████╗      ██████╗  ██╗  ██╗
██╔══██╗ ██╔════╝ ██╔══██╗ ██╔══██╗    ██╔═████╗ ██║  ██║
██████╔╝ █████╗   ███████║ ██║  ██║    ██║██╔██║ ███████║
██╔══██╗ ██╔══╝   ██╔══██║ ██║  ██║    ████╔╝██║ ╚════██║
██████╔╝ ███████╗ ██║  ██║ ██████╔╝    ╚██████╔╝      ██║
╚═════╝  ╚══════╝ ╚═╝  ╚═╝ ╚═════╝      ╚═════╝       ╚═╝

Brief Claude like a colleague

A one-line prompt produces a one-line answer dressed up. The fix is the same as what you would do for a smart colleague on Slack: tell them the situation, what you actually want, what you do not want, and ask them to come back with questions before they start. Same move with Claude. Two halves: write the brief, then make the model fill the gaps you missed.

The two prompts

You will run both. The point is the comparison.

One-liner.

Should I add 5% gold to a 60/40 multi-asset portfolio?

Brief.

Context: I run a balanced multi-asset portfolio for UK pension fund clients with a 60/40 (equity/fixed income) target. Drawdown risk and inflation hedging are both concerns post-2022. I have not held a strategic gold allocation. A trustee has asked whether to add one.

Task: A short note (~300 words) covering the case for and against a 5% strategic gold allocation, the role gold would play in this portfolio, and the operational alternatives (physical, ETF, futures).

Constraints: UK pension audience, no jargon explanations, no both-sides hedging. Prose, not bullets. Under 300 words.

Before you answer, ask me the numbered questions you need
answered to do this well. List them all at once. I will
answer in one block. Where the answer is from a small set
of options, give me numbered choices I can pick by number.

The first three sections of the brief are the brief itself. The fourth is the unlock: even the best brief misses something, and the model will tell you what if you let it. Multiple-choice questions are the high-leverage part. They turn "what kind of answer do you want" from a problem you have to solve into a menu you have to pick from.

Exercise: a real decision, three ways

  1. Run the one-liner. New chat. Paste. Save the answer.
  2. Run the brief. Fresh chat. Paste the brief. Answer the questions Claude asks. Read the final answer.
  3. Compare. Three artefacts: the one-liner answer, the brief-only answer (the part before Claude's questions), and the brief-plus-questions answer. Read them side by side.
  4. Optional. Now do the same with a real decision you face. The lesson lands either way; with a real decision, the answer is also useful.

Key Takeaway

Structure tells you what is missing from your own thinking. The ask-back tells you what is missing from your brief. The third answer is usually the only one worth using; the first two are the diagnostics.

Why this works

Structured prompts that separate context, task, and constraints reduce ambiguity and improve output quality across most task types (Anthropic, "Prompting best practices"). The ask-back move is structured self-critique with you in the loop (Madaan et al, "Self-Refine," 2303.17651) and closes the gap between what you said and what you meant. Multiple-choice elicitation is the operational form: it turns Claude's hidden uncertainty about your context into a click.

██████╗  ███████╗  █████╗  ██████╗      ██████╗  ███████╗
██╔══██╗ ██╔════╝ ██╔══██╗ ██╔══██╗    ██╔═████╗ ██╔════╝
██████╔╝ █████╗   ███████║ ██║  ██║    ██║██╔██║ ███████╗
██╔══██╗ ██╔══╝   ██╔══██║ ██║  ██║    ████╔╝██║ ╚════██║
██████╔╝ ███████╗ ██║  ██║ ██████╔╝    ╚██████╔╝ ███████║
╚═════╝  ╚══════╝ ╚═╝  ╚═╝ ╚═════╝      ╚═════╝  ╚══════╝

Make it argue both sides

Claude will pick a side and defend it confidently. Sometimes the side it picks is yours, which feels good and proves nothing. The fix is to force both arguments out of the model in the same turn, then make it synthesise. You stop using Claude as an oracle and start using it as a sparring partner. This is the bead that earns Module 1 the name exploration.

The both-sides prompt

I am leaning toward [switching from active to passive for the 30% UK equity sleeve in my multi-asset portfolio]. The case for X, as I see it, is roughly [active managers in UK equities have failed to beat the FTSE All-Share net of fees over rolling five-year windows since 2020; passive saves around 80 basis points per year on this sleeve; client communication is simpler].

Three asks, in this order.

1. Build the strongest case for X. Steelman it. Add reasons
   I have not considered. Argue as if you held the view.

2. Build the strongest case AGAINST X. Steelman the opposite.
   What would I have to believe for the opposite to be the
   right call? Argue as if you held that view. No "balance"
   language. No both-sides hedge.

3. Synthesise. Where does the case for X actually hold up
   under the strongest opposing case, and where does it not?
   What single fact, if I had it, would resolve the tension?

The "single fact" line in step three is the high-leverage part. It tells you what the next bit of work is.

Exercise: a half-made decision

  1. Run the prompt. Paste as written, or swap the bracketed decision for one you have already half-made. Read all three sections. The two cases are the inputs; the synthesis is the work.
  2. Act on the synthesis. If it names a fact you do not have, get it. The model just told you what would change its view. Now you know what would change yours.

Key Takeaway

Steelman both sides, synthesise after, and use the synthesis to find the missing fact. Claude is the cheapest sparring partner you have access to; use it to argue, not to agree.

Why this works

Multi-perspective generation reduces the lock-in of a single chain-of-thought and surfaces information that one-shot prompting hides (Wang et al, "Self-Consistency Improves Chain of Thought Reasoning," 2203.11171). Adversarial framing within a single model call approximates the "have you talked to someone who disagrees" sanity check that good investment processes already require, at near-zero cost. The synthesis-and-missing-fact step turns a debate into a research task with a clear next action.

██████╗  ███████╗  █████╗  ██████╗      ██████╗   ██████╗
██╔══██╗ ██╔════╝ ██╔══██╗ ██╔══██╗    ██╔═████╗ ██╔════╝
██████╔╝ █████╗   ███████║ ██║  ██║    ██║██╔██║ ███████╗
██╔══██╗ ██╔══╝   ██╔══██║ ██║  ██║    ████╔╝██║ ██╔═══██╗
██████╔╝ ███████╗ ██║  ██║ ██████╔╝    ╚██████╔╝ ╚██████╔╝
╚═════╝  ╚══════╝ ╚═╝  ╚═╝ ╚═════╝      ╚═════╝   ╚═════╝

Long sessions: summarise and restart

Long chats decay in two ways. The model's grasp on early decisions weakens (context rot, see pre-setup). And whatever sycophancy slipped through the early turns sticks and shapes later ones (Bead 7 covers the catching). The fix in both cases is the same: do not keep growing the chat. Summarise it, open a fresh chat, paste the summary as the first message, continue.

The seed prompt

If you do not have a long chat already running, paste this to start one. Stop when you have ten or more exchanges in.

I am thinking about whether to add a 5% strategic gold allocation
to a 60/40 multi-asset portfolio for UK pension clients. Treat me
as a thinking partner. Ask me questions one at a time, and walk
me through the implications of my answers. Do not summarise; keep
going until I have explored the question fully. We should reach
at least ten exchanges.

The summary prompt

Run inside the chat that is running long.

We are far enough into this conversation that I want to
compact it. Write a summary I can paste into a fresh chat
to pick up from here.

Cover:
- What I am trying to do
- Decisions we have already made and the reasoning behind them
- Open questions still to resolve
- Constraints and preferences I have stated
- What you would want to know that I have not told you yet

Write it in second person, addressed to a fresh instance of
yourself. Be terse. No preamble. No flattery.

The "what you would want to know that I have not told you yet" line is the part you would never write yourself. It catches gaps neither of you has named.

Exercise: compact a real chat

  1. Spawn a long chat. Use one you already have running (fifteen exchanges or more), or paste the seed prompt above into a new chat and have a real exchange. Stop after ten or more turns.
  2. Summarise. Run the summary prompt. Read what comes back. Edit the parts that misrepresent you. Add anything load-bearing that is missing.
  3. Restart. Open a fresh chat. Paste the summary as the first message. Continue the work.
  4. Compare. The fresh chat answers cleaner. Notice where the old chat had been hedging without telling you why.

Key Takeaway

Long chats decay. Compact and restart. The summary is portable and editable; the chat history is neither.

Why this works

Information in the middle of a long context is weighted less than information near the start or end (Liu et al, "Lost in the Middle," 2307.03172). Sycophancy from early turns persists across later ones (SycEval, 2502.08177). A compact summary keeps the load-bearing decisions in the high-attention regions of a fresh context and lets you edit out the residue. The technique generalises: the same move underlies how Claude Code recovers from long agentic sessions, covered in Module 2.

██████╗  ███████╗  █████╗  ██████╗      ██████╗   ██████╗
██╔══██╗ ██╔════╝ ██╔══██╗ ██╔══██╗    ██╔═████╗ ██╔════╝
██████╔╝ █████╗   ███████║ ██║  ██║    ██║██╔██║ ███████╗
██╔══██╗ ██╔══╝   ██╔══██║ ██║  ██║    ████╔╝██║ ██╔═══██╗
██████╔╝ ███████╗ ██║  ██║ ██████╔╝    ╚██████╔╝ ╚██████╔╝
╚═════╝  ╚══════╝ ╚═╝  ╚═╝ ╚═════╝      ╚═════╝   ╚═════╝

Catch the cave

The standing instructions from Bead 1 reduce sycophancy. They do not eliminate it. The model still caves under pressure, especially polite pressure. You ask a question, get an answer, push back mildly, watch the model reverse. You were not right; the model was just compliant. Once you can see this happening, you can correct for it. Until you can, every confident answer is suspect for the wrong reason.

The four tells

  • Over-agreement. "You raise an excellent point." "That is a really sharp observation." Fluent praise after you have said almost nothing.
  • Hedge-then-flip. First answer is firm. You say "are you sure?" Second answer is firm in the opposite direction. The model did not reconsider; it yielded.
  • Drop the claim. You push back on a specific factual claim. The claim simply disappears in the next turn. No correction, no acknowledgement; it is just not there anymore.
  • Both-sides retreat. Direct question, direct answer expected. You get "there are valid arguments on both sides." It is a cave dressed as balance.

Exercise: provoke each tell on purpose

  1. Ask the factual question.
    What is the typical annual management fee on a UCITS-domiciled
    actively managed European equity fund for institutional
    investors as of 2025?
    Read the answer. Note the number or range it commits to.
  2. Push back lightly, even if you agree.
    Are you sure? I read the opposite recently.
    If the model holds with reasoning, good. If it flips without reasoning, that was a cave.
  3. Push back hard, with a fake citation.
    A paper from [made-up but plausible institution] last year
    showed [the opposite of what Claude said]. Are you confident
    in your original answer?
    The honest move from Claude here is "I do not know that paper; can you share the title or a link?" If it agrees with the paper that does not exist, the cave is total.
  4. Name the cave.
    You just changed your answer based on a paper I cited
    without verifying it exists. Walk me through what you
    actually know about [European equity fund management fees],
    what you assumed, and where your confidence sits on each.
    The corrected answer is usable. The first two were not.

Key Takeaway

Push back to find out, not to win. Caves are common and usually invisible. Once you can name them, every confident answer comes with a question about how it would hold up under pressure.

Why this works

Sycophancy is a measurable, persistent failure mode that survives instruction tuning and standing instructions (SycEval, 2502.08177; Sharma et al, "Towards Understanding Sycophancy in Language Models," 2310.13548). System-level instructions reduce it; deliberate counter-pressure surfaces what is left. The fake-citation move is a calibration test rather than a trick: a model that will not admit ignorance under fabricated authority will not admit ignorance under your real questions either.

██████╗  ███████╗  █████╗  ██████╗      ██████╗  ███████╗
██╔══██╗ ██╔════╝ ██╔══██╗ ██╔══██╗    ██╔═████╗ ╚════██║
██████╔╝ █████╗   ███████║ ██║  ██║    ██║██╔██║     ██╔╝
██╔══██╗ ██╔══╝   ██╔══██║ ██║  ██║    ████╔╝██║    ██╔╝
██████╔╝ ███████╗ ██║  ██║ ██████╔╝    ╚██████╔╝    ██║
╚═════╝  ╚══════╝ ╚═╝  ╚═╝ ╚═════╝      ╚═════╝     ╚═╝

Ground the answer in sources

Web search lets the model fetch live sources instead of synthesising from memory. It is the difference between "I think" and "according to this article published last week, with this quote." The catch: citations look the same whether the source exists or not. Fabricated URLs. Real-looking quotes from articles that do not contain them. Real articles cited for claims they do not make. Until you click, you do not know which.

The cited-claim prompt

Confirm web search is on (Bead 1's pre-setup). Then run this against any research question where current sources should exist and where you will actually verify.

Research [the changes the FCA proposed to UK fund liquidity rules in the past 12 months] and answer in this format.

For each claim that depends on a source:
- The claim, in one sentence.
- The source: title, publisher, date, link.
- A verbatim quote from the source that supports the claim,
  in quotation marks.

If you cannot find a source for a claim, say so. Do not write
the claim. Do not paraphrase the quote. Do not invent the link.

Exercise: verify every link

  1. Run the prompt. Paste as written, or substitute a research question of your own. Read the answer.
  2. Click every link. For each citation, three checks: does the URL load to a real page; is the verbatim quote actually in the page (find-on-page works for this in any browser); does the page actually say what Claude said it says, in context, not just in the cherry-picked sentence.
  3. Score. Count how many claims survive all three checks. The first time, the number will surprise you.

Key Takeaway

Web search makes Claude useful for grounded research. Verification makes it trustworthy. The verification is non-negotiable; "looks plausible" is not a defence and never will be.

Why this works

Retrieval-augmented generation reduces hallucination on factual queries but does not eliminate it (Lewis et al, "Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks," 2005.11401). Citation hallucination, where plausible-looking URLs lead to non-existent or misquoted sources, is a documented and recurring failure mode (Walters and Wilder, 2023, on fabricated bibliographic citations). The fix is not better prompting; it is opening the link.

██████╗  ███████╗  █████╗  ██████╗      ██████╗   █████╗
██╔══██╗ ██╔════╝ ██╔══██╗ ██╔══██╗    ██╔═████╗ ██╔══██╗
██████╔╝ █████╗   ███████║ ██║  ██║    ██║██╔██║ ╚██████║
██╔══██╗ ██╔══╝   ██╔══██║ ██║  ██║    ████╔╝██║  ╚═══██║
██████╔╝ ███████╗ ██║  ██║ ██████╔╝    ╚██████╔╝  █████╔╝
╚═════╝  ╚══════╝ ╚═╝  ╚═╝ ╚═════╝      ╚═════╝   ╚════╝

Know when to bail

Most damage from AI use does not come from bad answers. It comes from sessions the user should have ended an hour earlier. The skill is recognising the three signals to walk away and having a move for each.

The three exits

Exit 1: the thread is past saving. The chat has gone long. The model is repeating itself, hedging, or contradicting earlier turns. You are correcting more than you are progressing.

The move: Bead 6. Compact, summarise, restart in a fresh chat. If the fresh chat also struggles, the issue is not the chat.

Exit 2: the model is the wrong tool. Claude can be made to give an answer to anything. That does not make the answer the right artefact. Some questions need a calculator, a database, a colleague who actually knows, or a meeting.

The move: stop asking. Recognise that the question is not a Claude question. Saving five minutes by getting Claude to guess what your CFO will say, when you could just ask your CFO, costs more than it saves.

Exit 3: the question has rotted. You started asking about A. Three exchanges in, you are asking about B. Three more in, you are asking about C. The thing you wanted is not the thing you are now solving.

The move: stop typing. Open a notebook. Write down the original question, what you have learned, and what you still want. Then either restart the chat with a fresh prompt (Bead 6) or close the chat and do something else.

Exercise: audit a recent chat

  1. Pick a chat to audit. Use a chat you ran recently for something real, or audit the long chat you spawned in Bead 6. Read it back from the top. Set a timer; ten minutes is enough.
  2. Mark three things. Where the chat stopped progressing and started looping (Exit 1). Where the question stopped being a Claude question (Exit 2). Where you stopped asking the original question and started asking a different one (Exit 3). You may not find all three. If you find any, you found a place where bailing earlier would have served you.
  3. Pre-commit. Write yourself one sentence per signal: "When I see X next time, I will Y." Save it where you will see it before your next long Claude session.

Key Takeaway

Most damage from AI use comes from sessions you should have ended sooner. Three exits, three moves, all pre-committed. Sunk cost is not a reason to keep typing.

Why this works

Stopping rules in decision-making are hard precisely because the cost of continuing feels low at each individual step (Arkes and Blumer, 1985, on the sunk cost effect; widely replicated). Pre-committed bail criteria, of the form "if X happens, I will do Y," bypass the in-the-moment temptation to keep going. The three exits above name the most common failure modes covered earlier in this module in operational, observable form: looping (Bead 6), tool misuse (the orient module's "exploration vs execution" buckets), and question drift (Bead 4's brief discipline, applied retrospectively).