Today, we will talk about storing a code in a version control repository. It seems to be an important discussion that has affected the programming world for some time. I also think it is beneficial to look at past decisions from time to time and recap whether they did us more good or harm. This way, we can avoid making poor judgments in the future. Having said that, we can gather all the experience we have regarding this issue and rethink our approach to it.
Suppose we have a team or many teams of developers that work on one or more projects. We should also assume that the codebase is crucial to us, and we want to keep it tidy, so we can work comfortably for years to come. It is a pretty specific list of requirements, and not every development is like that. Some businesses have very different priorities, and it is key to understand them before deciding on the subject of today’s discussion.
What is it really about? Well, one of the first questions that arises before starting a new project is where we should keep our work and how to structure it when the codebase grows. Sooner or later, we may expect an additional business requirement, new feature, separate microservice, and distinct build. All of this is still our project, yet from now on, we have multiple entities that we can easily separate from the main core of our application.
Monorepo is the approach where we keep all related and sometimes unrelated entities in one single repository of code. We can have a huge have-it-all monorepo with the entire codebase of our company or multiple smaller monorepos where we keep elements of certain projects. When a single project includes many specific services, and we store them in a single repository, from the perspective of this project, we have created ourselves a monorepo.
Multirepo is an opposite approach where we separate all entities and store them in distinct repositories. In this strategy, granularity is significant because we don’t want to go too far and end up having every utility in a detached place. Usually, we establish the division on a service level, so every single entity has its build process, deployment, and purpose.
Now, we will focus on the practical implications of these two ideas.
A single source of truth in terms of code management is good at sharing things and poor at dividing them, so the rule of thumb is that if we have much to share, we should consider a single repository solution.
The distribution subject in monorepo is vast because we can share many different things. Starting from the most obvious one, we can divide our code into logical modules or services and simply import features and utilities from one into another. The good part is that we can do this for free because every entity is in the same place, so the import does not require any setup. Still, we need to remember to keep it clean so we won’t end up with a spaghetti code.
This structure also supports atomic commits, where we can introduce a complete solution affecting multiple elements of the system. A common feature branch helps to obtain a comprehensive feedback via code review and deploy changes easily in a single coherent process described in the monorepo.
When we have a centralized repository, we can also share the quality requirements. Let’s consider utilities like automatic testing framework and configuration for a static code analysis. Everything we do in the repository will be restricted under the same decisions we made for the project. Wherever our programmers look and whatever they read, they see exactly the same quality processes in place. We may execute the procedures in a coherent CI configuration that encompasses all services that together represent the final application.
That kind of restriction also has an additional benefit. When we implement a solution and know that this is a shared space, we tend to care about it more. And when we do, more ideas appear to improve what we all value so much. This process of constant improvement affects the whole team, and that is something valuable to keep in mind.
The biggest single disadvantage is that keeping everything in good shape requires some work. We need to invest the most time at the very beginning to make sure that our code is logically separated and useful for sharing. We also should implement at least some of the processes we mentioned earlier to define the quality of code properly. Later, we don’t need as much effort to keep it up. Still, monorepo is never free of charge, and the maintenance process is required as our repository grows in size.
On the other end, we have a completely reverse situation. Multiple repositories provide us a strong separation for free and a much harder time when we try to share something between them. The satisfying thing is that setting up a new project is usually faster when placed in a new repository. That’s why this approach is preferred when wanting to create something fresh efficiently.
It is obvious that the benefits of monorepo will be the downsides for multirepo and the other way around. Knowing that, let’s focus on the unique advantages of having separate repositories.
First of all, we can profit from a default separation of concerns regarding code and people. Every separate service or even a small proof of concept has its own space where new technology may be adopted without any preparation. Whether we talk about devising something new or updating a current build, we only care about this one detached repository. Having no technology debt is a tremendous advantage when we want to move fast.
Another feature is the ability to assign unique responsibilities and access for specific teams and programmers. That may be crucial when we have multiple entities developed by different vendors, and we simply cannot share everything with everyone.
We do not have to worry so much about version control performance, which may happen in monorepo due to ever-growing projects. When designing our architecture as separated spaces, we have more flexibility regarding massive data structures that sometimes have to be incorporated into our code.
The main disadvantage concerning a multirepo approach is not connected to the code reuse though. Of course, sharing is problematic, but when we want to, there are ways to prepare a well-designed development environment and fast production build pipelines. However, when we have more and more independent pieces to combine, we will need more time to synchronize them. Good automatization may help a lot, yet repeating all standard procedures during implementation, CR, deployment, and completing them in various repositories simply takes valuable time away from the programmers.
There is no definitive answer to how to store your code. The decision between monorepo and multirepo should be pragmatic, based on empirical facts and requirements specific to your team and project. Sometimes, it is easier to start with multirepo and then merge services as projects mature and become more predictable.
From my perspective, monorepo has great benefits that you cannot substitute by anything else. Still, they are vital as long as you do not cherish the advantage of default separation more. This subject will always be an open discussion for generations to come. Cheers!