sia.hackernoon.com

Almost every Python project eventually needs a config. Everyone starts with a simple JSON or YAML — that's the normal thing to do. But once the app grows and goes to production, configs start fighting back.

You write destination_port in code, but in the test config, someone accidentally types destiantion_port. Instead of failing fast, the app silently falls back to the default port and only crashes in prod. Or a new parameter gets added, and suddenly half of the configs in the repo are only "valid by accident": some keys are copied and pasted, others are left over from another test, and no one is really sure which configs actually work.

My name is Dmitry, I'm a software engineer and Python mentor. I've had to clean up this mess more than once, both in projects I joined and in my mentees' code. These problems are everywhere (GitHub issues are full of them). So in this article, I'll show a simple and proven way with Pydantic v2 that saves both your nerves and precious hours of life.

Why configs become a problem in Python projects

Let’s say you’re developing a Python application. At some point it might make sense to introduce a config file that would somehow affect or control the behavior of your app. To start off, you’ll probably need just a simple config with a simple structure. What’s the simplest thing you can do to implement that? Of course, just slap together a JSON file, read it into a dict, and you’re good to go. Quick and neat!

However, as your application grows, so does the config file structure. And this is when some of the limitations of this approach start to reveal themselves. Let’s showcase a few of them.

Let’s assume you’re creating your own VPN service, and you’re developing a client app that will run on your users’ machines. You obviously want to allow the user to specify what server and port they want to connect to and how. So you expose those values in a config file like this. Let's say that by default we want to use port 4646.

cfg_example = {
    "destination_server": "example.com",
    "destination_port": 1234
}

DEFAULT_PORT = 4646

class VPNClient:
    def __init__(self, destination_server: str, destination_port: int) -> None:
        self.destination_server = destination_server
        self.destination_port = destination_port
        print(f"Initialized VPNClient with {self.destination_server=} {self.destination_port=}")

if __name__ == "__main__":
    cfg = read_from_json()
    client = VPNClient(destination_server=cfg["destination_server"], destination_port=cfg.get("destination_port", DEFAULT_PORT))
    client.do_stuff()

Not bad! However, this implementation opens the door to potential issues. Think what happens when a user makes a typo in the word "destination_port". Our VPNClient will try to use the default port value instead of the one the user was trying to specify, and will probably fail to connect. For now, let's just hope the user won't make that mistake.

Let's move on. Now our VPN client probably needs some encryption, so we want to configure that too. Since our application is still in the development stage, we’ll start off by supporting just one encryption method. For the sake of demonstration, let's say we want to use an encryption method called "cryptfoo" that requires the user to specify a password. So we adjust our config file to reflect that.

cfg_example = {
    "destination_server": "example.com",
    "destination_port": 1234,
    "password": "admin123"
}
DEFAULT_PORT = 4646
class VPNClient:
    def __init__(self, destination_server: str, destination_port: int, password: str) -> None:
        ...
if __name__ == "__main__":
    cfg = read_from_json()
    client = VPNClient(destination_server=cfg["destination_server"], destination_port=cfg.get("destination_port", DEFAULT_PORT), password=cfg["password"])
    client.do_stuff()

So far, so good! Now let's add a second encryption method "cryptbar" that requires the user to specify an encryption key. To reflect that in our config, we need to add two new parameters: method and key. However, look at what happens to the __init__() method arguments of the VPNClient class.

from typing import Optional
cfg_example1 = {
    "destination_server": "example.com",
    "destination_port": 1234,
    "method": "cryptfoo",
    "password": "admin123"
}
cfg_example2 = {
    "destination_server": "example.com",
    "destination_port": 1234,
    "method": "cryptbar",
    "key": "b3BlbnNzaC1rZXktdjEAAAAABG5vbmUAAAAEb"
}
DEFAULT_PORT = 4646
class VPNClient:
    def __init__(self, destination_server: str, destination_port: int, method: str, password: Optional[str], key: Optional[str]) -> None:
        ...
if __name__ == "__main__":
    cfg = read_from_json()
    client = VPNClient(
        destination_server=cfg["destination_server"],
        destination_port=cfg.get("destination_port", DEFAULT_PORT),
        method=cfg["method"],
        password=cfg.get("password"),
        key=cfg.get("key")
    )
    client.do_stuff()

Note that this piece of code has an important difference from the previous iteration. We had to make both"password" and "key" optional, so now it’s entirely possible to have a config that doesn’t specify one of them. That introduces an undesirable implicit assumption about the structure of the config. This code allows both "password" and "key" to be missing independently of each other, which is very different from a valid config, where exactly one of them is expected.

bad_cfg_example1 = {
    "destination_server": "example.com",
    "destination_port": 1234,
    "method": "cryptfoo",
    "password": "admin123",
    "key": "b3BlbnNzaC1rZXktdjEAAAAABG5vbmUAAAAEb"
}
bad_cfg_example2 = {
    "destination_server": "example.com",
    "destination_port": 1234,
    "method": "cryptbar"
}

This leads to a number of problems:

We lose the ability to detect a broken config early. By letting our code operate on a potentially broken config, we just defer verification to the moment we actually use the config values. That’s bad for many reasons. For example, the app can acquire scarce resources, make network connections, write to disk, waste system and user time—only to realize halfway through that the config is broken. That, first and foremost, hurts the user experience.**
The code semantics no longer reflect the actual logic. Imagine a junior dev joins your team and tries to understand this code. The impression they get is that "destination_server", "destination_port", and "method" fields are required, while "password" and "key"are independently optional. **But that’s just wrong.**On a small scale it might not look critical, but in a larger codebase it quickly becomes impossible to keep track of all the implicit assumptions.**
Nothing prevents configs from drifting. Let’s say our junior dev is still trying to figure things out and decides to check the configs in the unit test suite (you wrote them, right?). The "bad_cfg_example1" from above could easily be sitting in one of the tests. That’s especially likely in the test we added for the "cryptbar" method: the config was probably copy-pasted from the "cryptfoo" test and only minimally adjusted. Removing "password"wasn’t necessary for the test to pass, so now it’s there to confuse everyone indefinitely.
The type system no longer reflects the actual logic either. Think about how these config values will be used. For production-grade Python I always use an IDE with type checking enabled, and so should you. Let’s say we have a "connect_cryptbar" function and want to call it when "cryptbar" is specified. Here’s what happens:

def connect_cryptbar(destination_server: str, destination_port: int, key: str) -> None:
    ...

class VPNClient:
    def __init__(self, destination_server: str, destination_port: int, method: str, password: Optional[str], key: Optional[str]) -> None:
        ... # just does self.val = val for every field in the arguments

    def do_stuff(self):
        if self.method == "cryptbar":
            connect_cryptbar(destination_server=self.destination_server, destination_port=self.destination_port, key=self.key)     ● Argument of type "str | None" cannot be assigned to parameter "key" of type "str" in function "connect_cryptbar"    Type "str | None" is not assignable to type "str"      "None" is not assignable to "str"

The type checker in my IDE is flagging an issue. A valid config that uses the"cryptbar" method would never have its "key" unspecified, but our type system doesn’t reflect that fact — even if we already checked this assumption at the very beginning. At that point the options are: either suppress type checking (obviously a bad practice) or add an assert to convince the type checker that "key" is not None. Both only solve the problem locally: the next time the same field is used somewhere else, the exact same issue and the exact same choice come up again.

The same logic applies, though to a lesser extent, to the "method" field. For now we just treat it as a string, but in reality it’s better represented as an Enum (or StrEnum).

Adding Pydantic to your project

Now, let’s try to fix all of that by adding a bit of Pydantic.

https://www.youtube.com/watch?v=30tu2R4nKeM&embedable=true

Pydantic is a data-validation library for Python. Pydantic models are just Python classes with type annotations that behave a lot like dictionaries. Let’s start by defining a model for the config structure we had at the very beginning.

import pydantic
DEFAULT_PORT = 4646
class VPNClientConfig(pydantic.BaseModel):
    destination_server: str = pydantic.Field(description="VPN Server to connect to")
    destination_port: int = pydantic.Field(default=DEFAULT_PORT, description="VPN Server port to connect to")
    model_config = pydantic.ConfigDict(
        extra="forbid", # prohibits specifying any values except those defined above
        frozen=True,    # prohibits modifying fields after the model has been instantiated
    )
cfg_example = {
    "destination_server": "example.com",
    "destination_port": 1234
}
class VPNClient:
    def __init__(self, cfg: VPNClientConfig) -> None:
        self.cfg = cfg
        print(f"Initialized VPNClient with {self.cfg.destination_server=} {self.cfg.destination_port=}")
if __name__ == "__main__":
    cfg_content = read_from_json()
    cfg = VPNClientConfig.model_validate(cfg_content)
    client = VPNClient(cfg=cfg)
    client.do_stuff()

Let’s dissect what’s happening here. First, we define a VPNClientConfig class derived from pydantic.BaseModel, as all Pydantic models do. Then we define two fields on that model: one is a string and the other is an integer. Any instance of VPNClientConfig will have those two fields. We also define a default value for one of them and set up the "model_config", both of which only affect the parsing process that we’ll look at next.

Skipping ahead to the config reading, we see a call to the model_validate method of the VPNClientConfig class. This method tries to parse a Python object (usually a dict) according to the model definition. By default, all fields on the model are required unless they have a default value. If parsing fails, an exception is raised. If it succeeds, you get back an instance of VPNClientConfig.

Even at this stage there are already some benefits from using Pydantic. First of all, since the "model_config" specifies extra="forbid", we don’t need to worry about typos in parameter names: parsing will fail if any keys aren’t explicitly defined by the model. Second, it gives us a natural place to document what each field means. By default, Pydantic doesn’t do anything fancy with the "description" values, but at least the info is stored in the type itself. This info can be used, for example, to implement custom code that automatically puts it into your tool’s help output.

Adding strict models and discriminated unions with Pydantic

Now, let’s look at the real power of Pydantic by using it to implement the most advanced version of the app we’ve built so far.

import pydantic
from typing import Literal, Union, Annotated
DEFAULT_PORT = 4646
class StrictBaseModel(pydantic.BaseModel):
    model_config = pydantic.ConfigDict(
        extra="forbid", # prohibits specifying any values except those defined above
        frozen=True,    # prohibits modifying fields after the model has been instantiated
    )
class CryptFooConfig(StrictBaseModel):
    method: Literal["cryptfoo"] = pydantic.Field(default="cryptfoo", description="Use cryptfoo encryption")
    password: str = pydantic.Field(description="Password for cryptfoo encryption")
class CryptBarConfig(StrictBaseModel):
    method: Literal["cryptbar"] = pydantic.Field(default="cryptbar", description="Use cryptbar encryption")
    key: str = pydantic.Field(description="Encryption key for cryptbar encryption")
class VPNClientConfig(StrictBaseModel):
    destination_server: str = pydantic.Field(description="VPN Server to connect to")
    destination_port: int = pydantic.Field(default=DEFAULT_PORT, description="VPN Server port to connect to")
    encryption: Annotated[
        Union[
            CryptFooConfig,
            CryptBarConfig
        ],
        pydantic.Field(discriminator="method", description="Encryption config to use"),
    ]
cfg_example1 = {
    "destination_server": "example.com",
    "destination_port": 1234,
    "encryption": {
        "method": "cryptfoo",
        "password": "admin123"
    }
}
cfg_example2 = {
    "destination_server": "example.com",
    "destination_port": 1234,
    "encryption": {
        "method": "cryptbar",
        "key": "b3BlbnNzaC1rZXktdjEAAAAABG5vbmUAAAAEb"
    }
}
def connect_cryptbar(destination_server: str, destination_port: int, cryptbar_config: CryptBarConfig) -> None:
    ...
class VPNClient:
    def __init__(self, cfg: VPNClientConfig) -> None:
        self.cfg = cfg
    def do_stuff(self):
        if isinstance(self.cfg.encryption, CryptBarConfig):
            connect_cryptbar(destination_server=self.cfg.destination_server, destination_port=self.cfg.destination_port, cryptbar_config=self.cfg.encryption)
if __name__ == "__main__":
    cfg_content = read_from_json()
    cfg = VPNClientConfig.model_validate(cfg_content)
    client = VPNClient(cfg=cfg)
    client.do_stuff()

There’s a lot going on here, so let me explain some of the new concepts. First, we put some common Pydantic settings into a separate class that we then derive all our models from. NowVPNClientConfig has a required field "encryption" that is a Union of two other Pydantic models, which means it can be either one of them. It’s also annotated with a pydantic.Field that attaches a description and something called a discriminator. Adding this field makes it a discriminated union (see docs).

The problem with parsing Unions is that it’s not clear which model a given input should match. What if it matches more than one? Pydantic has a couple of strategies for that, and discriminated unions are, in my opinion, the best one. The user specifies which field Pydantic should look at to decide which model to validate against. In this case that field is "method", which makes a lot of sense. Each of the specific encryption configs defines a Literal value for "method", and Pydantic compares against it. As a result, if the source config specifies "method": "cryptbar", the rest of the object is validated against CryptBarConfig. If the "method" doesn’t match any of the models, validation fails.

Another nice thing here is that we can pass CryptBarConfig to connect_cryptbar directly. If we later change the CryptBarConfig model definition, the change will automatically be reflected both in the config parsing process and in the object that connect_cryptbar receives.

Now let’s see how the new implementation holds up against the problems I listed earlier:

The config is fully validated before it ever reaches the actual logic.
The parsing rules are dictated by the model definition, which directly reflects the reality of different encryption methods and their required inputs.
Config drift is no longer possible, since any unexpected fields trigger a validation error.
The type system defined by the Pydantic model is exactly the same thing that drives the parsing logic, so they stay in sync.

Running the advanced version

Here’s a quick demo of what happens when we pass the JSON configs from above into the program, with do_stuff() changed to just print the model. Note that the configs don’t actually have to be JSON — they can be anything Python can parse into a dictionary, for example YAML.

I ran this test with Python 3.10.12 and Pydantic 2.11.9. Here’s the exact source code I used:

import sys
import json
import pydantic
from pprint import pprint
from typing import Literal, Union, Annotated
def read_from_json(path) -> dict:
    with open(path) as f:
        return json.load(f)
DEFAULT_PORT = 4646
class StrictBaseModel(pydantic.BaseModel):
    model_config = pydantic.ConfigDict(
        extra="forbid", # prohibits specifying any values except those defined above
        frozen=True,    # prohibits modifying fields after the model has been instantiated
    )
class CryptFooConfig(StrictBaseModel):
    method: Literal["cryptfoo"] = pydantic.Field(default="cryptfoo", description="Use cryptfoo encryption")
    password: str = pydantic.Field(description="Password for cryptfoo encryption")
class CryptBarConfig(StrictBaseModel):
    method: Literal["cryptbar"] = pydantic.Field(default="cryptbar", description="Use cryptbar encryption")
    key: str = pydantic.Field(description="Encryption key for cryptbar encryption")
class VPNClientConfig(StrictBaseModel):
    destination_server: str = pydantic.Field(description="VPN Server to connect to")
    destination_port: int = pydantic.Field(default=DEFAULT_PORT, description="VPN Server port to connect to")
    encryption: Annotated[
        Union[
            CryptFooConfig,
            CryptBarConfig
        ],
        pydantic.Field(discriminator="method", description="Encryption config to use"),
    ]
class VPNClient:
    def __init__(self, cfg: VPNClientConfig) -> None:
        self.cfg = cfg
    def do_stuff(self):
        pprint(cfg)
if __name__ == "__main__":
    cfg_content = read_from_json(sys.argv[1])
    cfg = VPNClientConfig.model_validate(cfg_content)
    client = VPNClient(cfg=cfg)
    client.do_stuff()

Conclusion

Depending on your goals and the type of application you’re building, the complexity of your config can vary a lot. Sometimes it’s just a couple of values in a file, and sometimes it looks more like Kubernetes — a huge structure with many element types, each with its own parameters. Whatever the size, defining the structure with Pydantic models is a simple way to improve robustness, code quality, and maintainability.

The End of Config Hell in Python, Thanks to Pydantic v2

Why configs become a problem in Python projects

Adding Pydantic to your project

Adding strict models and discriminated unions with Pydantic

Running the advanced version

Conclusion