sia.hackernoon.com

A Humble (No) Developer’s Point of View

Intro

Since this is a tech blog post, and therefore it’s mandatory to flaunt your pedigree to convince people to even start reading, here we go: I currently have over 15 years of experience in backend and frontend development, my parents sent me to the best (public) schools, I’ve worked at a gazillion companies, I’ve seen every IT hype of the past two decades, and I often use a peremptory tone in my statements, so trust me I’m super skilled. Just kidding. I’m pretty average and actually pretty bad at tests. But I have eyes, and a brain, and sometimes I use both. That’s actually the basis of my job. And I’ve noticed that there are some things wrong with how we do tests. If you’re still reading at this point, maybe you want to know what they are? Still there? Great, so let’s start by stating the obvious (warning, this is going to be a bit rambling).

Disclaimers

Disclaimer #1 : This article contains mostly opinions, bad faith and dubious jokes. If you want to read something serious go read a book.
Disclaimer #2 : This article talks about a tool I developed. If you are allergic to self-promotion it's not for you. Go chill on social media.
Disclaimer #3 : Since English is not my native language, a LLM was punctually used to make sure some sentences parts and phrasing are sound for English readers. And even with this tool, there may be obvious English mistakes. As a French living in Canada for years I'd apologize, but I'm not integrated in Canadian society well enough yet.

Now you're warned.

So, Nobody Likes Writing Tests

Except for testers, duh. But most developers generally prefer developing. I don't want to speak for everyone (OK, you reading, who particularly loves writing and running tests, you’re the exception, happy now?), but this is a trend I’ve noticed. Just as bakers like to make bread, and proctologists... well, you get the idea. No sober developer denies the importance of tests. It’s like rain on a Monday, taking out the trash, cleaning up cat puke, filing your taxes—it’s unavoidable, tedious, and monotonous, but you can’t skip it and you vaguely get why it‘,’s useful. We all know that software isn’t just about writing code, it’s also about it not doing the wrong thing; I mean it should do exactly what the PO imagines the client wants. That’s already not an easy problem, but what’s even harder is making people who only want to code also test their work, all while they’re being squeezed for more and more features under contractual deadlines obtained with a “Don't worry, we just want a rough estimate” vibe. So how did these smart people that we are, respond to such demands, in your opinion? Of course, by happily

Writing Code That Tests Code

Let’s pause for a minute on this brilliant idea. Programmers are given a problem: sometimes their code does things you don’t want, or doesn’t do things you do want. What’s their solution? More code! Of course—for automated, easy, and frequent testing of their code. Problem solved!

Except That

If you expect me to trash the whole thing because I have an excess of stupidity or lack of skill or both, sorry, I’ll disappoint. I love test automation so much that I’ve devoted my career to making it as easy as possible for companies to have the greatest possible automated test coverage. But I’ve noticed a few problems that can crop up sometimes in certain contexts:

Writing Automated Tests “Isn’t Fun”

Exactly. We usually know what needs to be coded, but sometimes writing the test for it can be the real challenge. You have to mock all the dependencies unless you’re lucky enough to land on a brand new project with a hyper-rigorous dev and software architect team following every architecturally correct acronym (Don’t Repeat Yourself, You Ain’t Gonna Need It, Single-responsibility/Open-closed/Liskov/Interfaced/Dependency-inversion, Single Test Usecase Per Interacting Dependency – OK I made up the last one). You also need to know all the domain business objects to properly instantiate them and in the correct order, so the domain objects involved in the test avoid the null pointers hiding in the bowels of the service layer. All that for what? Testing that the insurance policy is invalidated if the date of certain sub-objects is outside the validity period of my ass. A little red indicator that takes hours of tedious work to turn green, with added frustration if the bug wasn't even in the business code but in the test code. Yay! With enough experience, you eventually realize you’ve been living in a big lie, and sometimes some coders wake up and realize in shock that

Code Is Not an Asset

In fact, it’s the opposite. The more code you have, the more likely you are to have bugs. But you probably think your test code is different? After all, it’s simpler than your business code, right? Actually, no. By its very nature—testing other parts of your program—your test code is tied to your business code, which is the very definition of complexity (as in “interlaced with”). So your test code will necessarily be linked to your business code, otherwise it’s pointless. Worse yet, your test code is likely surjective to your business code, because, more often than not, you’ll want to test several cases for each method you’re testing. Theoretically, with 100% coverage, you’d have N*M tests if you have M methods with N greater or equal to 1.

“Oh come on, nobody has 100% coverage, we’re usually OK with 80%.” That’s true! And yet in many cases I’ve observed that you can have nearly as much

As Much Test Code as Business Code

This might seem unbelievable to you, or make you envious, or you might just shrug and say “well, that’s how it is everywhere,” but yes, I’ve seen this often. And everyone acts like it’s completely normal. But just think about it for a minute. Test code is still code. And you have to maintain it, debug it, and test it. If you’ve never seen a ticket that says “Refactor the test code,” you’re either extremely lucky, haven’t worked in the industry for long, both, or something else I can’t guess. And if you’ve never suffered through endless peer review comments on automated test code, I beg you, leave the industry and save your soul before it happens to you. Yet none of this is surprising. Test code, like all code, is “alive” as long as it’s being worked on, and as the team grows in skill and understanding of the code, they will inevitably want to make it clearer, more efficient, and easier to handle. It even starts off at a disadvantage, because test code is often more cryptic than the business code itself. But that’s fine, and even desirable to refine it along the way! ...Except that it takes time and thoughts. Is that really the best place to put your hard-earned and expensive effort? Because the alternative—not doing it—isn’t really desirable either. In the long run, it can lead to a growing gap in quality between business code and test code. We end up with that situation where this part of the code is like the underbelly of Night City: everything depends on it, but no one wants to go there unless forced to. We don’t want that, so we just endure and wait for redemption, since The Prophecy says that

AI Will Save Us

Why bother writing and maintaining test code ourselves? After all, you just have to ask, “generate tests for this endpoint,” and ask again as needed while hoping the LLM doesn’t hallucinate too much, avoid introducing too many errors from its training data (or not too serious ones), and that the generated tests look meaningful from afar. After all, the only thing that matters is clearing the code coverage threshold so you can commit or pass the PR. The other option is to carefully write them by hand, mocking and instantiating the necessary test objects in the right order, sweating until that damned progress bar turns green. Then you just copy-paste, hoping it works as well for slightly different cases. (Seriously, who still does that?) Either way, you end up with even more test code and “TestUtils”-named classes bloating uncontrollably as their code makes less and less sense. And it’s still boring! No fancy design patterns, no fun smart logic to implement, no intellectual challenge. Just writing those friggin’ tests. Imagining scenarios that will probably never happen, testing trivial things like making sure adding two variables works, or writing tests so heavily mocked that you end up with a bug in the app while the test that’s supposed to catch it passes. And in the end, explaining at the daily standup why you didn’t finish a task because “we hit a problem with the tests.” Am I the only one who’s been through all this? Anyway, my point is, I don’t see how AIs will fix the fundamental problem that, generated or not,

Code Remains a Burden

Any code can have bugs, inconsistencies, readability issues or any other problem. Add to that the fact that, if it’s generated, no one really understands its existence: “The AI just generated it that way, no idea why, but it seems to work.” And that goes for test code, too. They often say that good test coverage allows you to easily refactor code, and you might think refactoring test code is less common and self-guaranteed. That’s true, but I feel like in a lot of literature about it, especially regarding TDD, they forget to mention that

Refactoring Isn’t Free

Even if you have super good test coverage and a German-level TDD discipline, anything you do to the code still takes time and intellectual effort, either to write it or to review it after it’s generated. That’s why—heresy—I’m still not a TDD enthusiast. I see its value in many contexts, but I haven’t yet accepted the TDD grace as the savior of my little developer’s heart. I still code in sin (often). Actually there are several reasons for this, for example the fact that thinking about code for the tests before thinking about it for the user need can create alignment issues. That is, TDD often involves taking at least three shaky assumptions as biblical truths:

1– User Needs and Tests Even Relate to Each Other

In other words, you just have enough tests, and eventually you’ll decide the client’s needs are met. That’s weird. How many, exactly? Sometimes it seems like the reverse of the reductionist approach we’re used to in software development. It’s like describing what a bird is, not by its characteristics, behaviors, or where it lives, but by giving examples of birds, non-birds, what it does or doesn’t do, where you do or don’t find it, etc. While “comparison isn't reason,” this still feels much more like a machine learning way of thinking than a human one. And it seems like it leads straight to the same alignment problems as in code: code that passes tests but doesn’t approximates user needs. This is even explicitly encouraged in TDD when you start a task! It would work if refactoring was free and instant, but in the real world, I still find it weird.

2– Writing Code For Tests Automatically Improves Architecture! (Yeah, right)

I’m not saying this is untrue. Compared to spaghetti code, writing code with testing in mind can sometimes result in better architecture, just like driving with one hand on a country road with one eye closed is safer than driving blindfolded in reverse on the highway in the fog. But this argument strikes me the same way as those who say “We should give billions to space companies because it sometimes leads to innovations for society.” I want to reply, “OK, but wouldn’t it be way more efficient to give those billions directly to innovations for society?” Similarly, what if we thought directly about well-architected code (whatever that means, another article for another time, relax).

3– The Tests Are Our Specs!

I think anyone who says this unironically in public should pay a fine, or at least put a buck in the swear jar. No, your tests are not your specifications. Stop insulting the people (ie. The PO, the thing-analyst, the software architect or whoever) who write specs. Your tests are ugly code (even uglier than your business code, because it’s “just test code, right?”)—so bad your QAs don’t like playing with them, even though we’ve set up frameworks to make them believe they can modify and create them in natural language (the comparison is the same as a photo of a prepared dish versus what’s in the plastic tray).

And yes, I know, XXXXXXX from QA (or the dev team that grabs all the QA tickets) loves to use Cucumber (among other abominations), or writes test code while singing loudly and attracting birds and squirrels and it echoes all over the open space. Let me tell you, XXXXXXX…

Ask for a Raise Right Now

Because to get to your level of Stockholm syndrome with tests, sometimes it takes years or even decades of training, Uncle Bob videos, snarky PR comments, and other passive-aggressive commit hooks for a developer to get there. Basically, we just wanted to develop cool features, remember! But hey, honestly, don’t worry, I’m not judging. Everyone has their kinks.

OK, Great, So What Do You Suggest, Mr. Smarty Pants?

It’s not enough for one to just list the painful points, you have to propose solutions too. And since this isn’t your first tech article, you probably think I have something to sell you. And yes, you’re absolutely right! But it won’t be unoriginal online courses, AI-generated conferences, or phony Agile / TDD online courses or seminars. I have too many scruples (and I’m too lazy), and don’t want to make Sam Altman richer! More importantly, I want to try proposing something new (even if innovation in quality assurance is always less sexy than in every other field). This is what I'm trying to offer with Yesbot, a test automation platform, built on alternative test implementation principles.

We Don’t Write Tests Anymore, Period!

Imagine that, overnight, aliens arrive and tell us, "Okay, enough nonsense, you have three months, and if you don’t create an ideal society by then, we’ll glass the planet. Good luck. Bye." So humanity throws together a utopia in a panic, and everyone is happy. To get there, let’s suppose one of the first measures is to eliminate the need to work. We live in a society where everyone is forced to live at the same comfort level. But then, who will still do the dirty jobs? No one, because no one will have to. That said, the need remains, so what’s left to do? Yes, exactly, eliminate the need if we can. (Good, you’re following.) For example, if no one wants to take out the trash, then we’ll do everything possible to almost generate no garbage (otherwise, planetary vitrification—pretty good motivation). The corollary is that at present, we don’t have to rack our brains, since we can just economically force some people to do these jobs, or else they end up homeless. So we don’t even think about it (at best, we sort recycling into a bunch of bins before it’s mixed back together at the nearest landfill).

So, in a lesser way, here’s my thought: I don’t like writing tests, but instead of asking, “How can I make myself like it?” I asked, “How can I stop having to write tests, while still making my code reliable?” The solution is simply to generate them. Whether it’s simple, algorithmic, and deterministic, or a bit more stochastic using language models. But to do that, you have to make a big mental leap (caution: the next paragraphs might upset your sensibilities): which is to start

Considering Tests as a Commodity

I think one of the “original sins” of the old approach is assigning intrinsic value to tests, a view inherited from our attitude toward code in general. For me, seeing test code as an asset is like assigning the same value to scaffolding around a building under renovation as to the building itself. Once you free yourself from that assumption, you don’t hesitate anymore to generate, delete, regenerate, and auto-fix tests under the supervision of someone who knows the real-world use cases. And it turns out all these operations are pretty automatable! In short, tests are disposable, your code is not. I warned you this would be shocking. And you haven’t seen anything yet. For example:

Forget About Low-Level Tests

This hot take is a little risky because it’s philosophical, but if you’ve made it this far (thanks!), hang in there—it’s almost done. I think we make a huge deal out of tests that cover the "internal gears," but honestly, I don’t find them that useful. OK, for some complicated calculations, it can be useful to ensure some level of reliability, but ultimately, if we test everything thoroughly at the highest level (for example, API or UI tests), do we really care if some function in the guts does its job? Here’s an extreme example: neural networks in ML. Who on swwet earth would write unit tests for the individual weights of neural networks when what matters is, ultimately, what the model produces? Imagine your system does everything you want at the highest level—who cares if internally it goes through X, Y, Z methods, passes back stuff instead of things, or even a bunch of elves passing notes performing the logic? (That’s a bit of my functionalist view.) You might argue it gives more fine-grained control and visibility into the code. That’s a whole research field in ML, about discoverability and explainability of neural nets. And I agree with that! All I’m saying is it shouldn’t matter as much, in fact, should be secondary to high-level tests. If you want to write unit tests for the addition function, go ahead—knock yourself out! But I think failures in those tests should never block a build (if the high-level tests pass), and they should be the first to be disabled if there’s a problem. But what I’ve seen in practice is often the contrary: high-level tests, being more complex and seen as less stable, are the first to be switched off in a panicked release “just for now” (which is technical-speak for “forever”). I find this very weird. So, to sum up, please, focus mainly on high-level tests, and generate them as much as possible.

OK, So What, Do We Fire All the QAs?

Great idea. But then who will have the expertise to use your system? Who will oversee that the generated tests are correct, and be able to make the link to the impacted code? Who will direct the algorithms and initiate the tests? The idea is to give QAs superpowers, not to replace them. That’s nor desirable nor possible, and you should be leery of anyone who claims otherwise (odds are, folks saying that are probably the ones who should politely be shown the exit). And more importantly, it’s not something you should ever want for all the reasons mentioned. If your goal is to truly improve your code’s reliability, a code coverage percentage is never a substitute for years of expertise (seems obvious, but worth saying over and over again), just like no metric reflects perfectly nor remotely your architecture quality.

That was about code creation. But for its execution now, I feel like we could do better. What if we decided to

Completely Rethink Test Execution Timing

How would you do it? You all know the traditional approach: write code (tests or business), then stop, then run the tests. Make changes. Re-run the tests. If it’s in a deployed environment, add a deployment and/or review step after the changes. And loop until the client is happy, pays a huge pile of money, and we party (OK, that’s just in my dreams—in reality it’s a lame, wordy PowerPoint and lukewarm filter coffee at a Monday retro). But haven’t you noticed something odd in this cycle? Yes, it’s asynchronous and dev-centered. Is that a problem? Not really. It means the developer or a QA triggers automated tests in most cases. (If your pipeline runs tests for other things, like releases, tags, or deploys, it runs the tests ahead of time just to double-check.) When there’s a red, the dev has to figure out what happened and why the test failed. For manual tests, though, it’s synchronous and QA-centered. The QA decides when to run tests on deployed environments (usually after her morning tea), and she knows exactly when a bug happens, after what action, and on which screen (so it’s synchronous). She then files a bug report for the dev team, and some guy in jeans sighs when it hits his desk. Now imagine doing this with automated tests: they could run periodically, or be triggered at deployments, or whenever—completely independently of the dev cycle. The advantage: no more blocking commits because of tests; the downside: you’d have to track deployments rigorously to know which commits they correspond to, to identify which commit broke things. But if you manage that, you’d automatically know which high-level actions—that is, which exact payloads—caused the problem. And with good logs, what the application state was at the time.

But I Still Want Low-Level Tests!

OK. Let’s say for a second you’re not convinced by my functionalist argument and you want to make sure your app’s internal mechanics aren’t going haywire. So let’s go further: imagine that instead (or in addition to) tests, we have “whistles” scattered throughout the code. These are like tests in that they take inputs and compare outputs to expected ones, but they run all the time, and feed back information on their state changes in real time. In other words, imagine an assembly line: currently, testing means changing it somewhere, then running it once to see if the product matches expectations at various stages. What I’m proposing is sensors everywhere, reporting info as the line runs. You could detect performance dips and breakdowns in real time, completely replacing sporadic low-level test runs with continuous execution. And the best part is, we can generate them pretty easily too!

So What, Generate Everything?

This whole post (which was supposed to be under 4 pages) comes from my dislike of writing, maintaining, and running tests, and my deep frustration at having to do it anyway. But I think, unlike most other parts of software development, these tasks are the ones that could most easily be handled by automation. I’d even say the whole industry would benefit from automating them. And yet, I hardly see any AI bros or VC-hyped startups tackling this. I guess it’s sexier for an investor to promise firing developers than to promise to improve code reliability for minimal effort. To address this, I’ve developed Yesbot for years, and with this tool I hope to put into action a concrete approach to testing that calls into question, in its entirety, the testing tradition the industry has leaned on.

In Conclusion

Writing code is a hard job, and ensuring it works is even harder. Over time we’ve developed methods, all with pros and cons, but I think it’s more than time to dust them off a bit. I don’t think everything should be thrown away; for example, I think some parts of test code should be kept, and my approach has its own strengths and weaknesses, and certainly won’t solve all your affective and relationship problems. Up to you to try it, or drop it in the bin labeled “too different from what we usually do and don’t have the time or resources to try because the release is in a week.” But in any case, if, like me, you’ve always hated writing tests and always felt that there should be a way to not do it anymore, look no further, it already exists! Head over to https://yesbot.alqemia.com/getting-started/ and contact me at [email protected] and let’s get started!

Why You Should Stop Writing Tests