Look at the GIF below — it shows a real-time Todo-MVC demo, syncing across windows and smoothly transitioning in and out of offline mode. While it’s just a simple demo app, it showcases important, cutting-edge concepts that every web developer should know.

This is a Replicache demo app that I ported from an Express backend and web components frontend to SvelteKit to learn about the technology and concepts behind it. I want to share my learnings with you.

The source code is available on Github.

Context and motivation

Web applications face some fundamentally hard problems, problems most web frameworks seem to ignore. These problems are so hard that only very few apps actually solve them well, and those apps stand head and shoulders above other apps in their respective space.

Here are some such problems I had to deal with in actual commercial apps I worked on:

  1. Getting the app to feel snappy even when it talks to the server, even over slow or patchy network. This applies not only to the initial load time but also to interactions after the app has loaded. SPAs were an early and ultimately insufficient attempt at solving this.
  2. Implementing undo/ redo and version history for user generated content (e.g site building, e-commerce, online courses builder).
  3. Getting the app to work correctly when open simultaneously by the same user on multiple tabs/ devices.
  4. Handling long-lived sessions running an old version of the frontend, which users might not want to refresh to avoid losing work.
  5. Making collaboration features/multiplayer functionalities work correctly and near real-time, including conflict resolution.

I encountered these problems while working on totally normal web applications, nothing too crazy, and I believe most web apps will hit some or all of them as they gain traction.A pattern I noticed in dev teams that start working on a new product is to ignore these problems completely, even if the team is aware of them. The reasoning is usually along the lines of "we'll deal with it when we start actually having these problems." The team would then go on to pick some well-established frameworks (pick your favorite) thinking these tools surely offer solutions to any common problem that may arise. Months later, when the app hits ten thousand active users, reality sinks in: the team has to introduce partial, patchy solutions that add complexity and make the system even more sluggish and buggy, or rewrite core parts (which no one ever does right after launch). Ouch.I felt this pain. The pain is real.Enter "Sync Engine."

What the hell is a sync engine?

Remember I said that some apps address these issues much better than others? Recent famous examples are Linear and Figma. Both have disrupted incredibly competitive markets by being technologically superior. Other examples are Superhuman and a decade prior, Trello. When you look into what they did, you discover that they all converged on very similar patterns, and they all developed their respective implementations in-house. You can read about how they did it (highly recommended) in these links: Figma, Linear, Superhuman, Trello (series).

At the core of the system, there is always a sync engine that acts as a persistent buffer between the frontend and the backend. At a high level, this is how it works:

Different implementations of sync engines make different tradeoffs, but the basic idea is always the same.

Not a new idea but...

If you've been following trends in the web-dev world, you'd know that sync engines have been a centrepiece in several of them, namely: progressive web apps, offline-first apps, and the lately trending term: local-first software. You might have even looked into some of the databases that offer a built-in sync engine such as PouchDb or online services that do the same (e.g., Firestore). I have too, but my general feeling over the last few years has been that none of it is quite hitting the nail on the head. Progressive web apps were about users "installing" shortcuts to websites on their home screens as if they were native apps, despite not needing installation being maybe "the" benefit of the web. "Offline-first" made it sound like offline mode is more important than online, which for 99% of web apps is simply not the case. "Local-first" is admittedly the best name so far, but the official local-first manifesto talks about peer-to-peer communication and CRDTs (a super cool idea but one that is rarely used for anything besides collaborative text editing) in a world of full client-server web applications that are trying to solve practical problems like the ones I described above. Ironically, many tools that are part of the current "local-first" wave adopted the name without adopting all the principles.

The one that drew my attention and interest the most is called "Replicache." Specifically, I was intrigued by it exactly because it's NOT a self-replicating database or a black-box SaaS service that you have to build your entire app around. Instead, it offers much more control, flexibility, and separation of concerns than any off-the-shelf solution I have encountered in this space.

What is Replicache?

Replicache is a library. On the frontend, it requires very little wiring and effectively functions as a normal global store (think Zustand or a Svelte store). It has a chunk of state (in our example, each list has its own store). It can be mutated using a set of user-defined functions called "mutators" (think reducers) like "addItem", "deleteItem," or anything you want, and exposes a subscribe function (I am simplifying, full API here).

Behind this familiar interface lies a robust and performant client-side sync engine that handles:

  1. Initial full download of the relevant data to the client.

  2. Pulling and pushing "mutations" to and from the backend. A mutation is an event that specifies which mutator was applied, with which parameters (plus some metadata).

    • When pushing, these changes are applied optimistically on the client, and rolled back if they fail on the server. Any other pending changes would be applied on top (rebase).

    • The sync mechanism also includes queuing changes if the connection is lost, retry mechanisms, applying changes in the right order, and de-duping.

  3. Caching everything in memory (performance) and persisting it to the browser storage (specifically IndexedDB) for backup.

  4. Since the same storage is accessible from all the tabs of the same application, the engine deals with all the implications of that—like what to do when there was a schema change but some tabs have refreshed and some haven't and are still using the old schema.

  5. Keeping all the tabs in sync instantly using a broadcast channel (since relying on the shared storage is not fast enough).

  6. Dealing with cases in which the browser decides to wipe out the local storage.

You might have noticed that this right here addresses a big chunk of the problems I listed at the top of this post. Being mutations-based also lends itself to features like undo/redo.

In order for all of this to work, it's your backend's job to implement the protocol that Replicache defines. Specifically:

  1. You need to implement push and pull APIs. These endpoints need to be able to activate mutators similarly to the frontend (though they don't have to run the same logic). The backend is authoritative, and conflict resolution is done by your code within the mutator implementation.
  2. Your database needs to support snapshot isolation and run operations within transactions.
  3. The Replicache client polls the server periodically to check for changes, but if you want close to real-time sync between clients, you need to implement a "poke" mechanism, namely a way to notify the clients that something has changed and they need to pull now. This could be done via server-sent events or websockets. It's an interesting API design choice—changes are never pushed to the client; the client always pulls them. I believe it is done this way for simplicity and ease of reasoning about the system. One thing for sure: it's good that they didn't make websockets mandatory because that would have made the protocol incompatible with HTTP (server-sent events stream over a normal HTTP connection), which would have required extra infrastructure and presented additional integration challenges.
  4. Depending on the versioning strategy, you might need to implement additional operations (e.g., createSpace).

If it sounds non-trivial to you, you are right. I don't think I fully wrapped my head around all the details of how it operates with the database. I'll need to do a follow-up project in which I totally refactor the database structure and/or add meaningful features to the example (e.g., version history) in order to get closer to fully grokking it. The thing is, I know how valuable this level of control is when building and maintaining real production apps. In my book, spending a week or two thinking deeply about and setting up the core part of your application is a great investment if it creates a strong foundation to build and expand upon.

Porting a non-trivial example

The best (and arguably only) way to learn anything new is by getting your hands dirty—dirty enough to experience some of the tradeoffs and implications that would affect a real app. As I was going over the examples on the Replicache website, I noticed there were none for Sveltekit. I have been a huge Svelte fan since Svelte 3 was released, but only recently started playing with Sveltekit. I thought this would be an awesome opportunity to learn by doing and create a useful reference implementation at the same time.

Porting an existing codebase to a different technology is educational because, as you translate the code, you are forced to understand and question it. Throughout the process, I experienced multiple eureka moments as things that seemed odd at first clicked into place.

Learnings

Sveltekit

  1. Sveltekit doesn't natively support WebSockets, and even though it does support server-sent events, it does so in a clumsy way. Express supports both nicely. As a result, I used svelte-sse for server-sent events. One somewhat annoying quirk I ran into is that since svelte-sse returns a Svelte store, which my app wasn't subscribing to (the app doesn't need to read the value, just to trigger a pull as I described above), the whole thing was just optimized away by the compiler. I was initially scratching my head about why messages were not coming through. I ended up having to implement a workaround for that behavior. I don't blame the author of the library; they assumed a meaningful value would be sent to the client, which is not the case with 'poke'.

  2. SvelteKit's filesystem-based routing, load functions, layouts, and other features allowed for a better-organized codebase and less boilerplate code compared to the original Express backend. Needless to say, on the frontend, Svelte is miles ahead of web components, resulting in a frontend codebase that is smaller and more readable even though it has more functionality (the original example TodoMVC was missing features such as "mark all as complete" and "delete completed").

  3. Overall, I love Sveltekit and plan to keep using it in the future. If you haven't tried it, the official tutorial is an awesome introduction.

Replicache

Overall, I am super impressed by Replicache and would recommend trying it out. At the basic level (which is all I got to try at this point), it works very well and delivers on all its promises. With that said, here are some general concerns (not todo app related) I have and thoughts related to them:

Should a sync engine be used for everything?

No, a sync engine shouldn't be used for everything. The good news is that you can have parts of your app using it while other parts still submit forms and wait for the server's response in the conventional manner. SvelteKit and other full-stack frameworks make this integration easy.Obvious situations where using a sync engine is a bad idea:

  1. Optimistic updates make sense only when client changes are highly likely to succeed (with rollbacks being rare) and when the client possesses enough information to predict outcomes. For instance, in an online test where a student's answer must be sent to the server for grading, optimistic updates (and hence a sync engine) wouldn't be feasible. The same applies to critical actions such as placing orders or trading stocks. A good rule of thumb is that any action dependent on the server and incapable of functioning offline should not rely on a sync engine.

  2. Any app dealing with huge datasets that cannot be fit on user machines. For example, creating a local-first version of Google or an analytics tool processing gigabytes of data to generate results is impractical. However, in scenarios where partial synchronisation suffices, a sync engine can still be beneficial. For instance, Google Maps can download and cache maps on client devices to operate offline, without needing high-resolution maps for every location worldwide all the time.

A word on developer productivity and DX

My impression is that having a sync engine can make DX (developer experience) much nicer. Frontend engineers just work with a normal store that they can subscribe to updates, and the UI always stays up to date. No need to think about fetching anything, calling APIs or server actions for the parts of the app that are governed by the sync engine. On the backend, I can't say much yet. It seems like it won't be harder than a traditional backend but I can't say for sure.

Closing thoughts

It's exciting to imagine the future of web apps as planet scale, real-time multi-player collaboration tools that work reliably regardless of network conditions, while at the same time making these nasty problems I started this post with a thing of the past.I highly recommend fellow web developers to get themselves familiar with these new concepts, experiment with them, and maybe even contribute. Thanks for reading. Leave a comment if you have any questions or thoughts. Peace..

P.S This interview with Aaron Boodman, the founder of the company that created Replicache, is great. Watch it and thank me later.