If you’ve ever asked an LLM for a “clear explanation” and got back the same sentence wearing three different hats, welcome to the club.

Repetition usually isn’t malice—it’s math. Decoding is probability, and the highest-probability path often loops through safe phrases (“therefore”, “in conclusion”, “from a broader perspective”) like a commuter stuck on the Manchester–London line with two signal failures and a mysterious “operational issue”.

Penalties are the knobs that break that loop.

Why LLMs Repeat (And Why It’s Not Your Fault)

Repetition happens for three very boring reasons:

1) Decoding follows the highest-probability trail

At each step, the model picks the next token based on probabilities. Once a phrase becomes likely, it can keep becoming likely (momentum + local coherence).

2) Training data contains a lot of templates

Web writing is full of ritual phrases. The model learns them because they appear everywhere—and because they “work” statistically.

3) Your prompt leaves the exit door open

If you don’t define scope, format, and constraints, the model will often “pad” to sound complete—by paraphrasing itself.

You can fix #3 with better instructions. You fix #1 and #2 with penalties.


The Penalty Trio (What They Actually Do)

Different platforms name these slightly differently, but the underlying idea is the same: change the odds of generating certain tokens.

Frequency penalty (token reuse tax)

In OpenAI-style APIs, frequency_penalty reduces the probability of tokens that have already appeared, proportionally to how often they appeared. Positive values make the model less likely to repeat itself.

Use it when:

Typical starting range:

Presence penalty (topic-hopping nudge)

presence_penalty penalizes tokens simply for having appeared at all, which encourages introducing new tokens/topics rather than staying on the same rails.

Use it when:

Typical starting range:

Repetition penalty (open-source classic)

In Hugging Face/Transformers style generation, repetition_penalty is a single multiplier applied to previously generated tokens. 1.0 means no penalty; values above 1.0 penalise repeats.

Use it when:

Typical starting range:


A Practical Tuning Playbook (No Guessing, No Vibes)

Here’s a simple workflow that works in real projects:

Step 1: Fix the prompt first (free wins)

Add these constraints:

Step 2: Add penalties gradually

Start conservative, then adjust by 0.1 increments.

Step 3: Measure, don’t just feel

A quick heuristic you can apply without tools:


Battle-Tested Settings by Scenario

Goal: avoid repeating the same argument with different wording.

Suggested settings:

Prompt add-ons:

2) Marketing copy (short, punchy, no buzzword soup)

Suggested settings:

Extra trick:

3) Customer support (multi-turn, avoid copy/paste)

Suggested settings:

Prompt add-ons:

4) Open-source inference stutter (local models)

Suggested settings:


Code: A Small Example You Can Actually Use

Here’s a minimal Node.js snippet that uses both penalties to reduce repetition when generating a short explainer (example topic: “why council tax bills can look confusing”). (I’m keeping it intentionally compact—paste, run, iterate.)

import OpenAI from "openai";
​
const client = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });
​
const prompt = `
Write a concise UK-focused explanation (max 180 words) of why a council tax account balance
can look higher than expected. Use 4 bullet points. Avoid filler phrases like "overall" or
"in conclusion". Each bullet must introduce a distinct reason.
`;
​
const res = await client.chat.completions.create({
  model: "gpt-4o-mini",
  messages: [{ role: "user", content: prompt }],
  temperature: 0.6,
  frequency_penalty: 0.7,
  presence_penalty: 0.3
});
​
console.log(res.choices[0].message.content);

What to tweak:

(These penalty parameters are part of OpenAI’s API surface for chat/completions.)


The Most Common Penalty Mistakes (And How to Dodge Them)

Mistake 1: “Zero repetition” as a goal

Over-penalising forces the model into awkward synonyms (“this caffeinated beverage” instead of “coffee”). Fix: keep penalties moderate, and enforce structure with the prompt.

Mistake 2: Using penalties to compensate for vague prompts

Penalties are not a replacement for constraints. They’re a multiplier on good instructions.

Mistake 3: Copying settings between tasks

A marketing penalty profile will wreck academic writing (you want consistent terminology in papers).

Mistake 4: Forgetting model differences

Open-source models can be more sensitive to repetition controls; increase slowly and watch coherence.


A Simple Checklist Before You Ship


Final Thought

Penalties aren’t about “making the model obey”. They’re about nudging decoding away from the comfy, repetitive path and back into useful language.

Treat them like seasoning:

Now go make your model shut up productively.