GPT-5.3 Codex Reviewed My .NET Data Access Library and Missed the Architecture

Let me set the scene.

You've shipped three .NET applications in the last two years. Each one has a DbContext, a generic repository, probably a Unit of Work wrapper. Maybe you even went full CQRS on the last one because you'd watched enough conference talks to feel guilty about not doing it.

And every single time, you wrote the same scaffolding. Different entity names. Different connection string. Same bones.

At some point you stop and think — why am I doing this again?

That question is what led me to build a reusable data access library packaged as a NuGet. And then, out of curiosity and a healthy amount of scepticism, I handed it to GPT-5.3 Codex for a full architectural review.

Here's what happened.

The Library: What It Does and Why It's Non-Trivial

The whole premise is simple: take every piece of data access boilerplate that every .NET developer writes over and over again — DbContext setup, repository abstraction, Unit of Work, CQRS separation — and package it once. Reuse it everywhere.

The API is intentionally fluent. You want to query? DataAccessor.Queries<T>().GetAll(). You want to write? DataAccessor.Commands<T>().Add(entity). Ready to persist? .Commit(). Clean, readable, consistent across every project that pulls in the package.

Sounds straightforward. But here's where it gets interesting.

In a normal application, your DbContext is concrete and known at compile time. It maps directly to your specific database schema. Easy. But a reusable library can't know your schema in advance. By definition, it can't. The concrete DbContext only exists in the consuming application, resolved at runtime. The library has to be built around that constraint.

This changes everything architecturally. It's the difference between designing a house and designing a blueprint system that lets other people design houses. The rules are different. The trade-offs are different. And the code will look different from what you'd write for a single application — intentionally so.

using System;

namespace Alexis.Infrastructure.DatabaseAccess
{
    // Q=Query, C=Command
    public abstract class DataAccessorBase<Q, C>
    {
        //protected D DatabaseContext;
        protected Q GenericDbQueriesExec;
        protected C GenericDbCommandExec;

        protected DbAccessPatternWrapper _dbPatternWrapper;

        public DataAccessorBase(Q genericDbQueries, C genericDbCommands)
        {
            GenericDbQueriesExec = genericDbQueries;
            GenericDbCommandExec = genericDbCommands;
        }

        /// <summary>
        /// Uses Reflection. 
        /// </summary>
        /// <param name="dbPatternWrapper"></param>
        public DataAccessorBase(DbAccessPatternWrapper dbPatternWrapper)
        {
            _dbPatternWrapper = dbPatternWrapper;
            Initialize(_dbPatternWrapper);
        }

        private void Initialize(DbAccessPatternWrapper dbPatternWrapper)
        {
            GenericDbQueriesExec = (Q)Activator.CreateInstance(typeof(Q), dbPatternWrapper.GenericDbRepository);
            GenericDbCommandExec = (C)Activator.CreateInstance(typeof(C), dbPatternWrapper);
        }

        public Q Queries()
        { return (Q)GenericDbQueriesExec; }

        public C Commands()
        { return (C)GenericDbCommandExec; }

        protected DbAccessPatternWrapper Create_DAWrapper<D>() where D : DbContext_AlexisBase
        { return _dbPatternWrapper.CreateNewInstance<D>(); }
    }
}

The Test Setup

I ran the review through VS Code using GPT-5.3 Codex, feeding it a structured instructions file rather than manually prompting. I was explicit about what I wanted: not a syntax review, not a style guide check. I wanted the model to evaluate the architectural decisions, understand the intent behind them, and produce a scored review out of 100, followed by concrete recommendations.

Output format: Markdown file plus an HTML version of the report.

The score: 46/100.

Breaking Down Where It Failed

Problem 1: It flagged reflection as a red flag

The library uses reflection to resolve a concrete DbContext at runtime. Codex flagged this. Recommended against it. Treated it as a code smell.

But reflection here isn't a shortcut or a hack. It's the only reasonable mechanism for doing what the library needs to do. When your DbContext is unknown at compile time and must be supplied by a consuming application, you need a way to discover and instantiate it dynamically. Reflection is the answer. There isn't a cleaner one that doesn't involve pushing significant complexity onto every consumer.

Codex couldn't make this leap. It saw reflection, matched it against the pattern "reflection is often misused," and flagged it. No consideration of why it was there.

Problem 2: DbContext lifecycle criticism missed the point entirely

A big chunk of the 46/100 score came from concerns about how DbContext lifecycle is managed. In a typical application, this is a legitimate thing to watch. Mismanaging your context can cause memory leaks, stale data, concurrency issues.

But the critique assumed a single-application context. In a reusable library, the consuming application owns the DbContext. The library works within that constraint. The lifecycle management looks different because the responsibility is distributed differently by design. Flagging it without understanding that boundary isn't a useful review — it's pattern matching against the wrong pattern.

Problem 3: The hybrid ID design went completely over its head

This one I found particularly interesting because it's a genuinely clever pattern that solves a real problem.

Every entity in this library carries two identifiers. The first is a DbId — a sequential integer, auto-generated by the database. I deliberately avoid calling it Id because that name tells you nothing about where it comes from or what it represents. DbId is explicit: it's the database's identifier, and it only exists after a successful write.

The second is a GUID, generated by the application the moment the entity is created. No round trip required. The application can assign identity, pass the entity around, queue it, log it — all before the database has ever seen it.

This hybrid approach gives you the relational database performance benefits of sequential integer keys while also giving you application-level identity without latency. It's a known pattern in distributed systems and event-sourced architectures, though not commonly seen in standard .NET CRUD applications.

Codex didn't recognise it. It saw the dual-ID design and appeared to treat it as redundancy rather than intentional architecture. No engagement with the performance rationale. No recognition of the distributed systems parallel. Just a flag.

The Real Problem: AI Reviews Code, Not Intent

Here's the honest summary.

If I were rating Codex purely as a code reviewer — syntax, structure, common anti-patterns, adherence to established conventions — I'd give it 8 out of 10. It's genuinely good at that. It knows the patterns. It catches real issues. It would save a junior developer from several embarrassing PRs.

But this review wasn't testing that. It was testing whether an AI model can evaluate architectural decisions in context — understanding the constraints the design is operating under, the trade-offs that were made deliberately, and the intent that isn't written in the code itself.

On that measure: 3 out of 10.

The gap isn't about knowledge. Codex knows what reflection is. It knows what DbContext is. It's seen CQRS documentation and repository pattern implementations thousands of times. The gap is about reasoning outside the code — holding a mental model of why the system is built the way it is, and using that model to interpret what it's looking at.

Current AI models evaluate against known patterns. Deviations get flagged. But the most interesting architectural decisions are almost always deliberate deviations — trade-offs made with full awareness that the conventional approach won't work at this level of abstraction.

The difference between "this looks unusual" and "this is unusual because the design is operating at a different level of abstraction" — that's the gap. And it's a big one.

What This Means for AI-Assisted Code Review

I'm not writing this to dunk on Codex. It's genuinely impressive at what it does. But I think the developer community is at risk of over-indexing on AI code review without being clear-eyed about what it can and can't evaluate.

For greenfield application code following established patterns? AI review is fast, useful, and surprisingly accurate. Use it.

For infrastructure code, reusable libraries, or anything where the architectural decisions are the interesting part? Treat AI feedback as a starting point, not a verdict. The model will flag things that look wrong by convention but are right by design. You need to know the difference.

The 46/100 score isn't a failure of the library. It's a pretty accurate measurement of where the current ceiling is for AI architectural reasoning.

We're not there yet. But it's a useful data point to have.

Building something opinionated in .NET? I'd be interested to hear how you've handled the reusability vs. boilerplate trade-off. Comments are open.

Watch Full Video

https://youtu.be/ZhHi9yqHinQ?embedable=true