sia.hackernoon.com

If you’ve ever asked an LLM for a “clear explanation” and got back the same sentence wearing three different hats, welcome to the club.

Repetition usually isn’t malice—it’s math. Decoding is probability, and the highest-probability path often loops through safe phrases (“therefore”, “in conclusion”, “from a broader perspective”) like a commuter stuck on the Manchester–London line with two signal failures and a mysterious “operational issue”.

Penalties are the knobs that break that loop.

Why LLMs Repeat (And Why It’s Not Your Fault)

Repetition happens for three very boring reasons:

1) Decoding follows the highest-probability trail

At each step, the model picks the next token based on probabilities. Once a phrase becomes likely, it can keep becoming likely (momentum + local coherence).

2) Training data contains a lot of templates

Web writing is full of ritual phrases. The model learns them because they appear everywhere—and because they “work” statistically.

3) Your prompt leaves the exit door open

If you don’t define scope, format, and constraints, the model will often “pad” to sound complete—by paraphrasing itself.

You can fix #3 with better instructions. You fix #1 and #2 with penalties.

The Penalty Trio (What They Actually Do)

Different platforms name these slightly differently, but the underlying idea is the same: change the odds of generating certain tokens.

Frequency penalty (token reuse tax)

In OpenAI-style APIs, frequency_penalty reduces the probability of tokens that have already appeared, proportionally to how often they appeared. Positive values make the model less likely to repeat itself.

Use it when:

Your output repeats the same adjectives (“powerful”, “efficient”, “seamless”).
The model keeps looping on the same “key benefit”.

Typical starting range:

0.3–0.8 for long-form explanations
0.8–1.2 for marketing copy (careful: too high can get weird)

Presence penalty (topic-hopping nudge)

presence_penalty penalizes tokens simply for having appeared at all, which encourages introducing new tokens/topics rather than staying on the same rails.

Use it when:

The model keeps circling one idea without adding new dimensions.
You want broader coverage (“give me 8 distinct angles”).

Typical starting range:

0.2–0.7

Repetition penalty (open-source classic)

In Hugging Face/Transformers style generation, repetition_penalty is a single multiplier applied to previously generated tokens. 1.0 means no penalty; values above 1.0 penalise repeats.

Use it when:

You’re running LLaMA/Mistral/Qwen locally and you see literal repeated phrases.
The model starts “stuttering” until it hits the max token limit.

Typical starting range:

1.1–1.3
Above 1.5 can produce “funky outputs” in some tokenizers/models (it’s a blunt instrument).

A Practical Tuning Playbook (No Guessing, No Vibes)

Here’s a simple workflow that works in real projects:

Step 1: Fix the prompt first (free wins)

Add these constraints:

Output format: bullet list, table, numbered steps
Hard limits: word count or max items
Anti-dup rule: “Don’t repeat the same phrase; each point must add new information.”
Stop words: “Avoid ‘in conclusion’, ‘overall’, ‘therefore’.”

Step 2: Add penalties gradually

Start conservative, then adjust by 0.1 increments.

If you see phrase loops, increase frequency_penalty.
If you see one-idea spirals, increase presence_penalty.
If you’re on open-source inference, try repetition_penalty before you rewrite the entire prompt.

Step 3: Measure, don’t just feel

A quick heuristic you can apply without tools:

If a sentence “could be swapped” with an earlier sentence and still make sense… it’s probably redundant.
If you can highlight the same adjective 5+ times in 300 words… your frequency penalty is too low.

Battle-Tested Settings by Scenario

1) Long reports (e.g., “UK fintech trends in 2026”)

Goal: avoid repeating the same argument with different wording.

Suggested settings:

frequency_penalty: 0.5–0.8
presence_penalty: 0.2–0.5

Prompt add-ons:

“Each section must introduce at least one new example (UK/EU).”
“No repeated transition phrases.”

2) Marketing copy (short, punchy, no buzzword soup)

Suggested settings:

frequency_penalty: 0.9–1.2
presence_penalty: 0.1–0.4

Extra trick:

Provide a banned word list (e.g., “innovative, cutting-edge, next-gen”).

3) Customer support (multi-turn, avoid copy/paste)

Suggested settings:

frequency_penalty: 0.3–0.6
presence_penalty: 0.2–0.5

Prompt add-ons:

“Don’t repeat instructions already given; only add missing steps.”
“Keep greeting to one short phrase per reply.”

4) Open-source inference stutter (local models)

Suggested settings:

repetition_penalty: 1.15–1.30
plus normal sampling choices (temperature/top-p) as needed

Code: A Small Example You Can Actually Use

Here’s a minimal Node.js snippet that uses both penalties to reduce repetition when generating a short explainer (example topic: “why council tax bills can look confusing”). (I’m keeping it intentionally compact—paste, run, iterate.)

import OpenAI from "openai";

const client = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });

const prompt = `
Write a concise UK-focused explanation (max 180 words) of why a council tax account balance
can look higher than expected. Use 4 bullet points. Avoid filler phrases like "overall" or
"in conclusion". Each bullet must introduce a distinct reason.
`;

const res = await client.chat.completions.create({
  model: "gpt-4o-mini",
  messages: [{ role: "user", content: prompt }],
  temperature: 0.6,
  frequency_penalty: 0.7,
  presence_penalty: 0.3
});

console.log(res.choices[0].message.content);

What to tweak:

Still seeing repeated phrasing? Push frequency_penalty → 0.8.
Output feels like it’s jumping topics? Pull presence_penalty → 0.2.
Text sounds unnatural? Lower both by 0.1.

(These penalty parameters are part of OpenAI’s API surface for chat/completions.)

The Most Common Penalty Mistakes (And How to Dodge Them)

Mistake 1: “Zero repetition” as a goal

Over-penalising forces the model into awkward synonyms (“this caffeinated beverage” instead of “coffee”). Fix: keep penalties moderate, and enforce structure with the prompt.

Mistake 2: Using penalties to compensate for vague prompts

Penalties are not a replacement for constraints. They’re a multiplier on good instructions.

Mistake 3: Copying settings between tasks

A marketing penalty profile will wreck academic writing (you want consistent terminology in papers).

Mistake 4: Forgetting model differences

Open-source models can be more sensitive to repetition controls; increase slowly and watch coherence.

A Simple Checklist Before You Ship

Prompt has a format and a length cap
Prompt includes an explicit no-dup rule
Penalties start in the conservative range
You adjusted in 0.1 steps, not giant jumps
Output reads naturally to a human (not a thesaurus)

Final Thought

Penalties aren’t about “making the model obey”. They’re about nudging decoding away from the comfy, repetitive path and back into useful language.

Treat them like seasoning:

Too little → bland, repetitive mush
Too much → weird, inedible synonyms
Just right → your output stops sounding like it’s writing to hit a word count.

Now go make your model shut up productively.

Stop the LLM From Rambling: Using Penalties to Control Repetition

Why LLMs Repeat (And Why It’s Not Your Fault)

1) Decoding follows the highest-probability trail

2) Training data contains a lot of templates

3) Your prompt leaves the exit door open

The Penalty Trio (What They Actually Do)

Frequency penalty (token reuse tax)

Presence penalty (topic-hopping nudge)

Repetition penalty (open-source classic)

A Practical Tuning Playbook (No Guessing, No Vibes)

Step 1: Fix the prompt first (free wins)

Step 2: Add penalties gradually

Step 3: Measure, don’t just feel

Battle-Tested Settings by Scenario

1) Long reports (e.g., “UK fintech trends in 2026”)

2) Marketing copy (short, punchy, no buzzword soup)

3) Customer support (multi-turn, avoid copy/paste)

4) Open-source inference stutter (local models)

Code: A Small Example You Can Actually Use

The Most Common Penalty Mistakes (And How to Dodge Them)

Mistake 1: “Zero repetition” as a goal

Mistake 2: Using penalties to compensate for vague prompts

Mistake 3: Copying settings between tasks

Mistake 4: Forgetting model differences

A Simple Checklist Before You Ship

Final Thought