Source code control and various code versioning products have been around for many years and is probably an outdated topic. While these have been around for many years, products evolve, and when working with different teams, I still find that while people use these products, their depth of understanding of a product is not very good and often that leads to challenges in effective usage. In all the list of things a developer is supposed to know and master, somehow source control knowledge seems to be on the backseat. With increased usage of DevOps and CICD pipelines, this limited knowledge can cause serious issues when it comes to automated deployments, because the chances of wrong code being released become high.
In this blog, I want to talk about the prominent source control repository products, and how they differ from each other in their ways of managing code and code branches. While at the heart, they have a common purpose and that is to keep versions of code files pretty much for the life time of the project, but the way they manage it tends to be different and can cause some grief to people who may switch from one to another, without understanding these differences.
More than two decades back, I had started working with Visual Source Safe (VSS), and then in a few years, I transitioned on to Team Foundation Server (TFS). If you are still using VSS, then God save you! All I can say is you are grossly outdated, and it would be worthwhile to migrate to a newer product ASAP.
In the last few years I have had exposure to Subversion (SVN) and most recently to Git. Moving to Git wasn’t a trivial shift as there some underlying differences in how Git operates. While we are on Git, do note that Git is the source control system and GitHub is just a hosting service. Similar services such as Atlassian BitBucket, Microsoft Azure DevOps, GitLab also exist, but GitHub has become the popular option for many organizations including Microsoft, which now owns the service.
All earlier source control systems like TFS, were based on the well-adapted client-server architecture, where the central repository was hosted on a server, and all developers/team members would connect to it as clients. Each developer would either get a read-only copy of the files, to view them, build them and in general look at them or would get a writable copy (called ‘checking out’) when he/she wants to edit the file. The developer would make changes to the code as required, hopefully, unit test it, and then check-in the code again to the server. Code check-in rules could be set, to ensure that code builds properly, automated tests run properly and all that, but that’s for a different time. For now, we are just looking at the fact that after changes, the code is checked in back into the server.
While checking in the code, there could be conflicts, because it may so happen that while developer A, was working on a code file, developer B, also happened to work on the same file and possibly same function and has already checked in his modifications. Now when developer A tries to check-in, the system finds that the file has already changed on same lines, and hence shows up conflicts. Conflict resolution is an important task and if not done correctly can cause severe damage to the progress of the project. I intend to cover this in more detail later.
Most server-based systems also allow locking of files or in other words, single checkouts. This means that if developer A has already checked out a file for modification, developer B cannot check it out and must wait till developer A checks in the code. While this makes life easy in terms of no conflicts to worry about, it could impact the delivery schedule in case developer A takes longer than planned, thus impacting developer B’s work schedule.
As a best practice, developers are expected to check in frequently thus frequently unlocking files and helping ensure conflicts don’t build up. The server has a clear idea on what files are being worked on by which developers and typically before a release, the configuration controller would ensure that all developers have checked in their files. Once all files are checked in, build can be performed and deployed appropriately to DEV/UAT/PROD environment as the case maybe. Repositories also has a concept of branch, which can be used to segregate code for different environments, different clients or for things like hot fix etc. This again is a topic for another discussion.
SVN and TFS are client-server-based products and work in the model as described above. As source control systems, they offer lot more features in terms of team management, build policies, release planning and integration with DevOps. However, I guess you get the sense by now… the details on this are for another discussion J. Here I am just trying to focus on the basic architecture of the products.
Having used TFS for close to two decades, moving to SVN wasn’t all that difficult as it had similar basic concepts. Some differences did exist. SVN doesn’t has a check-out concept. So, a developer has set of local files and he/she just starts working on it. Due to this, other developers will not really know if anyone else in the team is working on the same file as they are. With TFS, due to check-out feature, it was easily evident to all in the team as to who is working on which file.
Access to the server system is managed via tools like Tortoise SVN. It does have a visual indication of files that have modified (like TFS) and when committing (check-in in TFS), it helps identify the files modified and allows the developer to selectively commit files and enter a comment explaining the changes done.
Similar to the single check-out option in TFS, SVN allows locking of files. In this case, only the developer who has locked the file can check in. Other developers will get commit failure if they try to check in. The issue still is, that the failure happens during commit and not at the time the developer starts to modify the file. This also means that unless files are locked, looking at the server, one cannot find out which files are being modified. Hence, before a release, the configuration controller may pretty much have to reach out to each developer in the team and make sure that all have committed their files.
So much for SVN and TFS and their client-server architecture. Let’s now look at Git and see how it differs significantly with these other products. It takes a while to get used to the radically new concept. On the face of it, Git is like SVN and TFS in that it also has client and server presence. The server is where the code repository is. However, in this case, the client is also where the repository is. Developers, begin by taking a local copy of the code, called cloning. Then they create, what is called as a local branch and work on that code as per the requirements. The developer will continue to code and test his local code (in local branch) without really going back to the server. The good part about Git is that it enforces the concept of branches early on and there are benefits to it.
While SVN and TFS also have a concept of branching, but from my experience, I have seen that the branching in these products is used in context of environments (DEV/UAT/PROD), or with customers. So I would have a main (trunk) branch or I would have a branch specific to a customer. Production releases happen of these branches, by creating a build and then pushing it off to production. The code is typically tagged so that it is easy to pull out the specific code version that is deployed on production.
Unlike this, the branching in Git is typically feature oriented. There is a primary dev branch, of which developers create their own branches, which represent specific feature implementations they are working on. On work completion, these are merged back to dev branch and then to the master, from which the production deployment happens. The individual developer created branches can be deleted once merged back to dev. The process of pushing the developer’s code from local machine to server is done in two steps. First it is committed to the local branch, which syncs it with server branch of the same name, and then it is merged with the dev branch typically via a pull request.
In a team environment, and as a best practice, it is very likely that code review would be a requirement. Git provides an easy feature to enforce review process to be integrated with the code commit. In this case, a developer will raise a pull request to merge his/her specific branch with the dev branch. The code isn’t merged with the branch directly but awaits approval. The idea is that someone from the team will review the code, and then approve. Once approved, the code then gets merged with the branch to which the pull request was raised.
One aspect that I personally haven’t been able to get past yet is the term ‘pull request’ (PR). Even in Git’s own parlance, pull is the act using which developer gets the latest code from server to local machine and push is when developer’s local code is pushed to the server. So ideally it should have been ‘push request’ and I have seen very many queries in the forums about the same. Long story short, we need to live with this naming and just understand that pull request really means pushing the code to some branch by first having someone in the team approve it.
I mentioned earlier that Git kind of enforces usage of branches by developers, which is a good thing. Consider developer A and B doing some feature development. As part of typical organizational practice, they check in their code at the end of the day. If using systems like TFS/SVN where people don’t tend to use feature level branches, both developer A and B will check in their code to the main branch itself. If A’s work is incomplete, but B’s is, and B wants to test the feature, he/she will end up creating a build with incomplete work from A. This might cause issues during testing and incorrect inferences. On the other hand, in case of Git, developer and A and B will be working on their local branches. Commits will be happening to their local branches itself. When B is ready to test, a PR can be raised to merge to dev branch and only B’s code will be merged, and testing can be done on it. A’s half cooked code doesn’t interfere at all.
This brings me to the end of this blog. I will try and cover the points I left open above, in later blogs. Keep watching this space for them.