In the summer of 2021, I worked on one of the most complex projects I’ve ever put my hands on - a game. Gaming was the spark that sent many software engineers on their programming journey and I’d lie if I say it wasn’t a factor for me as well.
I got to build the prototype for an online turn-based card game similar to Magic: The Gathering and Hearthstone. I was ecstatic. There wasn’t a single line of code written. It was a blank slate and I got to paint it.
It’s rare to get the chance to design a whole system. In my work I’d only ever worked on a part of one or added components to the existing architecture, making small decisions and changes.
It was an exciting challenge that pushed my understanding of architecture, design, and even UI development.
Designing the Prototype
Despite my lack of experience with game engines, I managed to put one together. It wasn’t performant and best practices were a distant afterthought. But I got it to work and that was everything I wanted from a prototype.
With that finished, the next challenge I had on my hands was the infrastructure. And if the game engine was a daunting task because of my lack of knowledge, then networking and communication seemed impossible, at least at first. The real-time nature of multiplayer games meant a complex-to-design system.
But when you start deconstructing the problem, you see patterns and problems seen in most event-driven systems. What I had to do was to find a way to accept user input, run it through the game engine and update the board state. A normal request-response strategy wouldn’t be effective since we need to show each change to both players.
At this point, I was sure that I won’t be able to nail the design of the system the first time. So I spoke to the team and we decided to get something simple working and then iterate on it based on the lessons we learned. To me letting go of perfectionism is the right mindset when you’re working on something very complex.
For the first implementation, we leaned on practices we know and created a unidirectional data flow with three different components - a React app, the game engine packaged as a REST API, and Firestore.
The UI sends a request with the card the user wants to play and the engine applies it if it’s valid. Then the engine updates the game state in Firebase. The React app is subscribed to updates in Firebase, receiving updates on the game object. Under the hood, the Firebase client used polling to check for updates.
Flaws of the Architecture
This design was easy to understand and simple to implement but it had major flaws. The polling approach was inefficient and pulling the entire board state each time, made the requests slower. But what fascinated me was the direct impact that the architecture had on the user experience.
There was a big delay between each user action and its visual result since the request had to go through the game service, get persisted in the storage, and then polled again by the UI. Each time a player did something they had to wait until the board was updated.
Because of the frequency of the polling, it didn’t show the events one by one so it was normal to see multiple changes at the same time, making it unclear what was played and in what order. In other words, it was clunky.
This taught me the importance of understanding the product and the domain when you’re building the architecture of your system. When we don’t take the UX into account the end result is filled with workarounds and complexity that aim to fix the flaws of the system.
The UI is the end product, so it’s only natural that it should influence the engineering decisions of the whole system. But in practice, this is rarely the case. So we have to use practices like optimistic updates and applying the visual change of a user’s action immediately to make the game feel snappy.
But such decisions introduce more problems. Optimistic updates create an inconsistency between the UI and the actual game state. So if the player plays three cards, but the second one gets rejected from the game engine, we would have falsely communicated that it was successful. This can lead to players wasting their cards and ruining their games.
To solve this problem we have to introduce more and more logic in the UI, but the front-end is not the place to dump a system’s complexity. Having such cases means that the architecture is built in a generic way that doesn’t support the specifics of the product.
Nowadays, I start my projects from the UI and shape my data access patterns and logic from the interactions in it. The front-end is often seen as just the presentation layer, but at the end of the day, the UI is the product.
A consideration that every engineer or architect would have is what if the product changes? What if it pivots, making the existing architecture unusable? If a product changes so much it will probably make the design of its systems obsolete either way.
When presented with the choice, I’d rather optimize the system for the current state of the product instead of trying to predict the future.
Redesigning the System
I no longer worked on the game after the prototype, but I have an idea of how I would redesign it given the chance now.
When a player plays a card, the UI should wait for a confirmation of the action from the game service. Meanwhile, the UI should be blocked, preventing the user from taking actions that may have to be invalidated. That solves all our problems with optimistic updates.
That action will result in one or more events that are streamed to both players using a message broker like Kafka. This removes the need for polling and takes away the additional latency from the storage.
The storage should be a NoSQL store that holds the data in a format optimized for reading, so when a player reconnects they can retrieve the whole game object in a performant manner. Analytics and leaderboards can be held in a separate store which can normalize the data to make it searchable.
The last thing we need to address is the actual events streamed to the players. We don’t want both of them to get the same events since we need to hide information between them. Player A shouldn’t receive an event holding the information about what card player B has just drawn.
This filtration can be done in a service that doubles down as a web socket between the React applications and Kafka.
I think I can write at least three more articles with the learnings I took from this project - both technical and personal. But the lessons on architecture were invaluable.