sia.hackernoon.com

You don’t always need a new model. Most of the time, you just need better examples.

Large language models aren’t “smart” in the human sense; they’re pattern addicts. Give them the right patterns in the prompt, and they start behaving like a domain expert you never trained.

This is basically what Few‑Shot In‑Context Learning (ICL) is about: instead of retraining or fine‑tuning the model, you inject knowledge directly through carefully crafted examples in the prompt.

In this piece, we’ll walk through:

What “knowledge injection” really means in practice
How Few‑Shot‑in‑Context sits between “do nothing” and “full fine‑tune”
A concrete mental model of how LLMs learn from examples
Five battle‑tested design principles for good few‑shot prompts
Cross‑industry examples (medical, finance, programming, education, law)
Six common failure modes — and how to debug them
Where this technique is going next (RAG, multi‑modal, personal knowledge, etc.)

If you’re tired of hearing “we need to fine‑tune a custom model” every time someone wants to add new knowledge… this article is for you.

1. Why “knowledge injection” matters (and where Few‑Shot ICL fits)

Most real‑world LLM problems boil down to one painful fact:

The model doesn’t actually know what you care about.

There are two big gaps:

Fresh knowledge Anything that happened after the model’s training cutoff:
- 2024–2025 policy changes
- New internal processes
- Product updates, pricing changes, API deprecations…
Niche / long‑tail knowledge Things that were never common enough to appear in large quantities online:
- A tiny vertical’s industry standard
- Rare diseases’ treatment guidelines
- Your company’s weird internal naming conventions

The classic response is:

“Let’s fine‑tune a model.”

Which sounds cool until you realise what that implies:

Curate + clean + label a dataset
Pay for training (GPU hours, infra, dev time)
Wait days or weeks
Repeat whenever something changes

For many teams (and almost all solo builders), that’s overkill.

Enter Few‑Shot‑in‑Context

Few‑Shot‑in‑Context Learning = you don’t touch model weights. You just:

Pick 3–5 good examples that:
- Embed the knowledge you care about
- Match the task format you want
Stick them into the prompt before the actual query
Let the model infer the pattern and apply it to new inputs

You’ve just “fine‑tuned” the model’s behaviour… without training anything.

Example: suppose you want the model to answer questions about a 2025 EV subsidy policy that it’s never seen.

Instead of fine‑tuning, you can drop something like this into the prompt:

Task: Decide if a car model qualifies for the 2025 EV subsidy and explain why.

Example 1
Input: Battery EV, 500km range
Output: Eligible. Reason: battery electric vehicle with range ≥ 500km gets a 15,000 GBP subsidy.

Example 2
Input: Plug‑in hybrid, 520km range
Output: Eligible. Reason: plug‑in hybrid with range ≥ 500km also gets 15,000 GBP.

Example 3
Input: Battery EV, 480km range
Output: Not eligible. Reason: 480km < 500km threshold, so no subsidy.

Now solve this:
Input: Battery EV, 530km range
Output:

From these 3 examples, the model can infer the rule:

“Type ∈ {BEV, PHEV} AND range ≥ 500km → 15k subsidy; otherwise 0.”

That’s knowledge injection via examples.

2. How Few‑Shot‑in‑Context actually works (mental model)

To understand why this works, forget about gradient descent and transformers for a second and think in more human terms.

When you few‑shot an LLM, it roughly goes through three internal phases:

2.1 Example parsing: “What’s going on here?”

The model scans your prompt and:

Detects the task pattern: “Ah, it’s always Task → Input → Output.”
Extracts knowledge bits: “EV subsidy depends on:
- type: BEV / PHEV vs fuel car
- range: threshold 500km
- standard subsidy amount: 15k”

It’s not memorising the car models; it’s capturing the relationship between input features and output labels.

2.2 Pattern induction: “What’s the general rule?”

Given multiple examples, the model tries to generalise:

Example A: BEV, 500km → 15k
Example B: PHEV, 520km → 15k
Example C: BEV, 480km → 0
Example D: Fuel car, 600km → 0

It can then infer something like:

“Subsidy depends on being a new energy vehicle AND hitting the range threshold. Fuel cars never qualify.”

This is the “few‑shot learning” part: very little data, but high‑signal structure.

2.3 Task transfer: “Apply that rule over here”

When the user sends a new query — say:

Input: PHEV, 510km

The model doesn’t go “I’ve seen this exact string before”. It goes: “This matches the pattern → I know the rule → apply → 15k subsidy.”

This is the crucial distinction:

Good few‑shot prompts teach the model the logic, not just the answer.

If your examples are just random Q&A with no visible structure, the model may end up parroting instead of reasoning. The rest of this article is about avoiding that.

3. Five design principles for Few‑Shot knowledge injection

You can’t just throw examples at the model and hope for the best. The quality of your examples is 90% of the game.

Here are five principles I keep coming back to.

3.1 Cover both core dimensions and edge cases

Real rules always have:

Normal cases (most inputs)
Edge cases (boundary conditions, exclusions, weird situations)

Your examples should cover both.

Bad pattern (only “happy paths”):

Example 1: BEV, 500km → subsidy
Example 2: PHEV, 520km → subsidy

The model might incorrectly infer:

“If it’s an EV and the number looks big-ish, say ‘subsidy’.”

Better pattern (explicit edges):

Example 1: BEV, 500km → eligible (base case)
Example 2: PHEV, 520km → eligible (second base case)
Example 3: BEV, 480km → not eligible (range too low)
Example 4: Fuel car, 600km → not eligible (wrong type)

Now the model has:

Dimensions: drive type, range
Edges:
- Range < threshold
- Non‑EV types

Design rule:

For any rule you’re encoding, ask: “What are the obvious ‘no’ cases? Did I show at least one of each?”

3.2 Keep the format perfectly consistent

Models are surprisingly picky about format. If your examples look one way and your “real” task looks another, performance tanks.

You need consistency at three levels:

Structure — same high‑level layout e.g. always Task → Input → Output, in that order.
Terminology — same words for the same concepts Don’t alternate between EV, electric car, new energy car unless you really have to.
Output shape — same style of answer e.g. always “Yes/No + Reason”, or always a JSON object, or always a 3‑bullet explanation.

Inconsistent example (what not to do):

Example 1: “2025 EV subsidy: BEV A, 500km → 15k.”   (terse)
Example 2: “According to the 2025 regulation, model B qualifies for…”
Task: “Explain whether model C qualifies and justify your reasoning.”

The model has to guess which style to imitate.

Consistent example (what you want):

Task: Decide if the car qualifies for subsidy and explain why.

Example 1
Input: Type=BEV, Range=500km
Output: Eligible.
Reason: Battery EV with range ≥ 500km meets the threshold; base subsidy is 15,000 GBP.

Example 2
Input: Type=PHEV, Range=490km
Output: Not eligible.
Reason: Plug‑in hybrid but range 490km < 500km threshold, so no subsidy.

Now solve this:

Input: Type=BEV, Range=530km
Output:

Everything lines up, so the model can just continue the pattern.

3.3 Sample count: 3–5 examples is usually enough

This one is empirical but holds up annoyingly well:

Less than 3 examples → shaky generalisation, easy overfitting More than 5–6 examples → diminishing returns + wasted context

Why not 20 examples? Because:

LLMs have a context window; you pay in tokens
Extra examples can actually dilute the pattern if they’re noisy
You often don’t have 20 high‑quality labelled examples for a new rule

The sweet spot for most tasks is:

3–4 examples for simple rules
4–5 examples when you need to show edge cases + 1–2 tricky combos

If you feel you need 12 examples, it’s often a smell that:

You haven’t factored the rule cleanly, or
You’re mixing multiple tasks into one prompt

3.4 Your examples must be factually correct

This sounds obvious, but in practice… we all copy‑paste in a hurry.

The model assumes your examples are ground truth. If you smuggle in a bug, it’ll confidently reproduce it everywhere.

Example of hidden poison:

Example 1: Range ≥ 500km → subsidy 15,000
Example 2: Range 520km → subsidy 20,000

Now the model has to guess whether:

You changed the policy halfway through, or
You made a mistake

There’s no way for it to “correct” you; it’s not cross‑checking with the internet.

Treat your few‑shot block as production code:

Double‑check numbers & thresholds
Make sure all examples obey the same rule
If you’re encoding law / medicine / finance, verify against the source document

3.5 Order your examples from simple → complex

Humans don’t like learning by being dropped straight into edge cases; neither do LLMs.

If your first example already mixes three special conditions, the model might:

Over‑weight that weird case
Miss the simpler underlying rule

A nicer pattern:

Base case A
Base case B
Clear negative / boundary
Then maybe one tricky combo

Example in subsidy land:

Example 1: Simple positive (type OK + range OK)
Example 2: Simple negative (type OK + range too low)
Example 3: Simple negative (wrong type)
Example 4: Complex (meets base rule + qualifies for extra top‑up)
Example 5: Complex (meets base rule but not the extra condition)

By the time you show examples 4 and 5, the model already knows the base rule and can now learn the extra dimension cleanly.

4. Cross‑industry case studies

Let’s look at how this plays out in different domains. Same technique, very different flavours.

We’ll keep the prompts compact but expressive; you can expand them in your own stack.

4.1 Medical: encoding a rare disease guideline (HAE)

Problem

You want the model to assist clinicians with a 2024 guideline on a rare disease:

Hereditary Angioedema (HAE) – recurrent non‑itchy swelling – C1‑INH deficiency – often misdiagnosed as allergy

The base model either:

Has never seen the latest guideline, or
Hallucinates based on generic “allergy” patterns

Instead of fine‑tuning, you inject the guideline logic as examples.

Few‑shot pattern

You define a task:

Task: Based on the 2024 HAE guideline, decide:
1) Does this case meet the diagnostic criteria?
2) What is the reasoning?
3) What is an appropriate initial management plan?

Then you show 3–4 cases that span:

Typical family case
Clear non‑HAE allergic case
“Sporadic” case with no family history but positive lab markers
Another non‑HAE case with itchy rash

Each example is shaped like:

Example 1
Input (summary):
- Recurrent facial swelling, non‑itchy
- Mother with similar episodes
- C1‑INH activity 20% (low)

Output:
1) Diagnosis: Consistent with HAE.
2) Reasoning: Non‑itchy angioedema + positive family history + low C1‑INH.
3) Management: …

Then your real query:

Now analyse this patient:

- 35‑year‑old male
- Recurrent eyelid swelling for 2 years
- No itch, no urticaria
- Father with similar symptoms
- C1‑INH activity 25% (low), C4 reduced

Return the same 3‑part structure.

The model doesn’t need to “know” HAE from training — it learns the diagnostic logic on the fly from your few examples.

You still need a human in the loop (this is medicine), but the model stops hallucinating generic “take some antihistamines” advice.

4.2 Finance: injecting a brand‑new IPO regulation

Problem

You’re building an assistant for investment bankers analysing 2025 IPO rules for a specific exchange (say, a revised science‑tech board in UK).

The rulebook changed in March 2025:

New market‑cap + revenue + profit combos
Special rules for red‑chip / dual‑class structures
Stricter restrictions on use of proceeds

Your base model has never seen this document.

Few‑shot pattern

You declare a task:

Task: Based on the 2025 revised IPO rules, decide whether a company qualifies for listing on the sci‑tech board. Explain your reasoning by:
- Market cap & financial metrics
- Tech / “hard‑tech” profile
- Use of proceeds
- Any special structure (red‑chip, VIE, dual‑class)

Then you show three archetypal examples:

Clean domestic hard‑tech issuer
- R&D‑heavy AI chip company
- Market cap ≥ 50B, revenue ≥ 10B, net profit ≥ 2B
- Proceeds used to build new chip fab
- Result: qualifies
Red‑chip, loss‑making biotech
- Cayman / Hong Kong structure
- Market cap ≥ 150B, loss‑making but high R&D ratio
- Proceeds for overseas R&D centres
- Result: qualifies under special red‑chip route
Non‑tech traditional business misusing funds
- Clothing manufacturer
- Market cap and profits below threshold
- Wants to use proceeds to buy wealth‑management products
- Result: does not qualify (fails both “hard‑tech” positioning and funds‑use rule)

Now when you feed in:

Company D:
- Hong Kong‑registered red‑chip
- Quantum computing hardware
- Projected post‑IPO market cap: 180B
- Still loss‑making; R&D / revenue ratio 30%
- No VIE structure
- Proceeds used to hire researchers and develop quantum algorithms

Question: Does it qualify? Explain by the same dimensions as the examples.

The model can map this onto the red‑chip example and apply the rule: “large market cap + genuine hard‑tech + loss‑making allowed under route X + proceeds used for R&D” → qualifies.

No regulation fine‑tuning required.

4.3 Programming: teaching Python 3.12 type parameter syntax

Problem

You’re generating code, but the base model:

Was trained mostly on Python ≤ 3.11
Keeps suggesting TypeVar boilerplate
Doesn’t use Python 3.12’s new generic syntax

You want the model to write idiomatic 3.12 code, today.

Key new idea

Instead of:

from typing import TypeVar, List

T = TypeVar("T")

def identity(x: T) -> T:
    return x

Python 3.12 lets you write:

def identity[T](x: T) -> T:
    return x

And similarly for classes:

class Stack[T]:
    ...

Few‑shot pattern

Define the task:

Task: You are coding in Python 3.12.
- Prefer the new type parameter syntax: def func[T](...) → ...
- Prefer built‑in generics (list[T], dict[str, T]) over typing.List / typing.Dict.
- When you see older TypeVar patterns, refactor them into the new style.

Then show a few before/after examples.

Example 1 — basic function

# Before (pre‑3.12)
from typing import TypeVar
T = TypeVar("T")

def echo(x: T) -> T:
    return x

# After (Python 3.12)
def echo[T](x: T) -> T:
    return x

Example 2 — list generics

# Before
from typing import TypeVar, List
U = TypeVar("U")

def head(values: List[U]) -> U:
    return values[0]

# After
def head[U](values: list[U]) -> U:
    return values[0]

Example 3 — generic class

# Before
from typing import TypeVar, Dict
K = TypeVar("K")
V = TypeVar("V")

class SimpleCache:
    def __init__(self) -> None:
        self._data: Dict[K, V] = {}

    def get(self, key: K) -> V | None:
        return self._data.get(key)

# After
class SimpleCache[K, V]:
    def __init__(self) -> None:
        self._data: dict[K, V] = {}

    def get(self, key: K) -> V | None:
        return self._data.get(key)

Example 4 — dictionary helper (the “target” pattern)

Now you show a case very close to what you actually want:

# Before
from typing import TypeVar, Dict
T = TypeVar("T")

def get_value(d: Dict[str, T], key: str) -> T | None:
    return d.get(key)

# After
def get_value[T](d: dict[str, T], key: str) -> T | None:
    return d.get(key)

Now when you ask:

“Write a generic lookup function in Python 3.12 that looks up a key in a dict[str, T] and raises KeyError if missing, using the new type parameter syntax.”

…you’ll get something like:

def lookup[T](store: dict[str, T], key: str) -> T:
    if key not in store:
        raise KeyError(key)
    return store[key]

Exactly the behaviour you want — without touching the model weights.

4.4 Education: encoding a new curriculum standard

Problem

In 2025, the middle‑school math curriculum adds a new unit:

“Data & algorithms basics”
- Simple visualisation (e.g. box plots)
- Basic algorithm concepts (e.g. bubble sort, described in natural language)

You’re building a teacher‑assistant tool that:

Reviews lesson plans
Flags whether they align with the new standard
Explains why in teacher‑friendly language

Few‑shot pattern

Define the task:

Task: Given a lesson description for grades 7–9, decide:
1) Does it align with the 2025 “Data & Algorithms Basics” standard?
2) Why or why not?
3) If not, how could it be adjusted?

Show three archetypes:

Good “box plot” lesson
- Students compute quartiles for real exam‑score data
- Draw box plots
- Discuss concentration and spread → Meets “data visualisation” requirement
Good “bubble sort” lesson
- Teacher explains algorithm with diagrams
- Students describe steps in natural language
- No Python coding required → Matches “understand algorithm idea”, not “implement deep learning”
Overkill AI lesson
- Teacher asks 9th graders to code a neural network in Python
- Predict exam scores from features → Way beyond the standard; flagged as misaligned

Then test case:

Lesson: Grade 8 “Data & Algorithms Basics”.
- Students design a simple random sampling plan to pick 50 students out of 2,000.
- They collect “weekly exercise time” data.
- They draw a box plot of exercise time and discuss the results.

Question: Aligned or not? Analyse using the three points as in the examples.

The model learns:

Random sampling + box plots + interpretation
Is very much in‑scope for the new standard
And it can explain that in the same structure as the examples.

4.5 Law: encoding new labour‑law amendments

Problem

You have a legal assistant for HR that needs to know about 2025 labour‑law amendments, including:

Occupational injury insurance for platform workers / gig economy
How to measure working time under remote work
Minimum compensation for non‑compete agreements

The base model doesn’t know about the 2025 changes.

Few‑shot pattern

Define the task:

Task: Given a short labour dispute case, do:
1) Legal assessment under the 2025 amendments
2) Cite the relevant new article(s) in plain language
3) Suggest next steps for the worker and/or employer

Show three examples:

Gig worker injury
- Platform courier, no formal contract, injured on delivery
- Platform refuses compensation → New rule: factual employment + platform must pay into injury insurance → Suggest: apply for recognition of labour relationship, then injury claim
Remote work time tracking
- Employer arbitrarily deducts pay by claiming “low efficiency”
- No proper time‑tracking system → New rule: use “effective working time” and require documented tracking → Suggest: ask employer for evidence; if none, pay must be corrected
Non‑compete with absurdly low pay
- 2‑year non‑compete, but only 500 GBP/month compensation → New rule: at least 30% of average salary and no lower than minimum wage → Worker can refuse to comply or demand revised compensation

Then you ask about a design worker who:

Worked fully remotely
Got only 3,000 GBP in June, while their historical average is 8,000
Employer never tracked “effective working time”
Employer claims “your efficiency was low”

The model can now chain:

This matches the remote work example
Employer failed their duty to track time
Worker can demand full pay based on historical average, minus what’s been paid

Again: the law text itself lives outside the model. The usable logic gets distilled into 3–4 examples.

5. Six common failure modes (and how to fix them)

Even with the right idea, few‑shot prompts can fail in subtle ways. Let’s debug some typical issues.

5.1 The model just parrots your examples

Symptom

Works fine on seen cases
On new input, it copies values from examples or repeats entire example answers

Likely causes

Examples are purely concrete — lots of “case A / case B” but no visible rule
Task wording doesn’t force generalisation

Fixes

Explicitly encode the rule once

Instead of only:

Example: EV A, 500km → 15k
Example: EV B, 520km → 15k

Add a line like:

Rule: For all BEVs and PHEVs with range ≥ 500km, subsidy is 15,000 GBP.

Align features

If your examples all use “Car: A, range 500, type BEV”, and your real task says “Model E, long‑range battery, SUV body style”… the model may not spot that “long‑range battery” = “range feature”.

Be boring and explicit:

Always show the same fields: Type=…, Range=…, BodyStyle=….

5.2 The model ignores edge cases

Symptom

Handles normal cases well
Fails on boundary or special cases that were in your examples

Example: you showed one “HAE with no family history” case, but the model still insists “no family history → no HAE”.

Likely causes

Only a single edge example vs many normal examples
Edge case buried at the end with no emphasis

Fixes

Add at least two edge‑case examples
Label them clearly, e.g.:

[Boundary case] Patient C: no family history, but low C1‑INH and typical symptoms → still HAE.

Explicit tags like [Boundary case] or [Special exception] help the model treat them as part of the rule, not noise.

5.3 Examples are too long (signal drowned in noise)

Symptom

Prompt looks like a wall of text
Model latches onto random parts (e.g. story flavour) instead of the core logic
Sometimes it even ignores later examples because the context is overloaded

Likely causes

You copy‑pasted entire documents instead of minimal supervised examples
You included narrative fluff, version history, citations, etc.

Fixes

Trim each example to: Input → Output → Short explanation
Put long policy / guideline text in a separate section above, and summarise the operative rule in the example

Try to keep:

Each example ≤ ~100 tokens if possible
All examples + instructions under ~80% of your context window, leaving room for the actual query and model’s reasoning

5.4 Terminology collision across domains

Symptom

You re‑use generic terms like “subsidy”, “margin”, “policy” in multiple domains
The model mixes up meanings (e.g. subsidy in finance vs subsidy in automotive)

Likely causes

No domain qualification: “subsidy” means different things in 3 prompts
Inconsistent phrasing: “post‑IPO market cap” vs “total value after listing” vs “size”

Fixes

Define terms in your instructions:

In this task, “market cap” always means “projected post‑IPO market cap” (shares × offering price).

Use domain‑specific names where possible:
- ev_subsidy
- ipo_post_listing_market_cap
- hae_lab_marker

Consistency beats cleverness.

5.5 Small models can’t juggle too many conditions

Symptom

A big model (e.g. GPT‑4‑class) handles your prompt perfectly
A smaller one (7B / 13B) seems to ignore half the conditions

Example: in HAE diagnosis, the small model only looks at symptoms but ignores lab values and family history.

Likely causes

You’re asking it to learn a 3–4‑dimensional rule in one shot
Each example mixes many conditions at once

Fixes

Break the problem into layers of examples:

Single‑dimension examples
- Only symptoms: itchy vs non‑itchy swelling
Two‑dimension examples
- Symptoms + lab values
Three‑dimension examples
- Symptoms + lab values + family history

The smaller model can first lock in:

“Non‑itchy swelling” vs “allergic swelling”
Then “low C1‑INH” vs “normal”
Then “family history strengthens the case”

Instead of being hit with all three dimensions at once.

5.6 Silent errors in your examples poison everything

Symptom

Everything feels coherent
But the model’s answers are consistently off from the real rules

When you go back and re‑read your few‑shot block, you find:

One threshold is wrong
Or two examples contradict each other

Fixes

Treat prompt debugging like code review:

Validate against the source of truth
- Policy document
- Official guideline
- Legal article
Check internal consistency
- Same thresholds everywhere
- Same units (km vs miles, %)
- No “≥ 450km” in one example and “≥ 500km” in the next

If you fix the examples and the model’s behaviour changes, you just proved that your few‑shot block is acting like a miniature “training set in context” — which is exactly the point.

6. Where this is going next

Few‑Shot‑in‑Context is not a temporary hack; it’s becoming part of the core LLM engineering toolbox, especially when combined with other techniques.

A few directions that are already practical today:

6.1 Few‑shot + RAG = dynamic knowledge injection

Instead of hard‑coding your examples into the prompt, you can:

Store policy snippets / typical cases in a vector DB
At query time:
- Retrieve the most relevant 3–5 items
- Format them as few‑shot examples
- Feed them to the model

You get:

Up‑to‑date knowledge (change the DB, not the model)
Domain‑specific behaviour
No retraining loops

With vision‑capable models, examples don’t have to be text‑only.

You can show:

An image of a box plot + text interpretation → teach data‑viz reading
A scan of a legal clause + structured summary → teach contract analysis
A medical image + diagnosis → teach pattern recognition frameworks (with humans supervising)

The principle is the same: a tiny set of high‑quality, well‑structured examples sets the behaviour.

6.3 Personal and team‑level “knowledge presets”

For individuals and small teams, we’ll likely see:

Tools that help you generate few‑shot blocks from your own notes
“Profiles” that encode:
- Your coding style
- Your company’s policy interpretations
- Your favourite data formats

Think of it as a “soft fine‑tune” you can edit in a text editor.

7. Takeaways

If you remember only three things from this article, make them these:

Fine‑tuning is rarely your first move. Often, you can get 80–90% of the value by injecting knowledge with 3–5 carefully chosen examples.
Good few‑shot prompts encode logic, not trivia. Cover both normal and edge cases, be brutally consistent in format, and keep examples factual and compact.
Few‑shot is a bridge between raw models and real products. It lets you adapt general‑purpose LLMs to fast‑moving, niche, or private knowledge — on your laptop, in minutes, without a GPU farm.

Once you start thinking of prompts as tiny, editable “on‑the‑fly training sets”, you stop reaching for fine‑tuning by default — and start shipping faster.

Stop Fine-Tuning Everything: Inject Knowledge with Few‑Shot In‑Context Learning

1. Why “knowledge injection” matters (and where Few‑Shot ICL fits)

Enter Few‑Shot‑in‑Context

2. How Few‑Shot‑in‑Context actually works (mental model)

2.1 Example parsing: “What’s going on here?”

2.2 Pattern induction: “What’s the general rule?”

2.3 Task transfer: “Apply that rule over here”

3. Five design principles for Few‑Shot knowledge injection

3.1 Cover both core dimensions and edge cases

3.2 Keep the format perfectly consistent

3.3 Sample count: 3–5 examples is usually enough

3.4 Your examples must be factually correct

3.5 Order your examples from simple → complex

4. Cross‑industry case studies

4.1 Medical: encoding a rare disease guideline (HAE)

4.2 Finance: injecting a brand‑new IPO regulation

4.3 Programming: teaching Python 3.12 type parameter syntax

Example 1 — basic function

Example 2 — list generics

Example 3 — generic class

Example 4 — dictionary helper (the “target” pattern)

4.4 Education: encoding a new curriculum standard

4.5 Law: encoding new labour‑law amendments

5. Six common failure modes (and how to fix them)

5.1 The model just parrots your examples

5.2 The model ignores edge cases

5.3 Examples are too long (signal drowned in noise)

5.4 Terminology collision across domains

5.5 Small models can’t juggle too many conditions

5.6 Silent errors in your examples poison everything

6. Where this is going next

6.1 Few‑shot + RAG = dynamic knowledge injection

6.2 Multi‑modal few‑shot

6.3 Personal and team‑level “knowledge presets”

7. Takeaways