pieterh wrote on 09 May 2012 21:49
One of git's great features is how easy it makes branches. Almost all git projects use branches, and the selection of the "best" branching strategy is like a rite of passage for an open source project. Vincent Driessen's git-flow is maybe the best known. It has 'base' branches (master, develop), 'feature' branches, 'release' branches, 'hotfix' branches, and 'support' branches. Many teams have adopted git-flow, which even has git extensions to support it. However, in this article I'll argue that public git branches are harmful, based on experience and evidence, and propose a branch-free approach, based on forks.
Background
Let me start with my credentials. My first open source project was Libero from 1991. I wrote Xitami, a popular open source web server, and killed that in 2001. I wrote most of OpenAMQ, the first AMQP implementation. I founded and steered the ZeroMQ community and have maintained its stable releases for years. If there is one thing I know really well, it's how to build excellent software.
Git is a revolution, especially when combined with github. In the last year or two, the github/git combination has become a key tool for organizing teams, and building processes like C4 and PC3 that are (as far as I know) the first reusable contracts of their kind.
Here is a section of PC3 that will shock some people:
- The project SHALL have one branch ("master") that always holds the latest in-progress version and SHOULD always build.
- The project SHALL NOT use topic branches for any reason. Personal forks MAY use topic branches.
- To make a stable release someone SHALL fork the repository by copying it and thus become maintainer of this repository.
To be clear, it's public branches in shared repositories that I'm talking about. Using branches for private work, e.g. to work on different issues, appears to work just fine.
The PC3 text is not accidental. This section came from trial-and-error, mainly in the ZeroMQ community. Originally, when Martin Sustrik and I (the pragmatic core developers) started using forks instead of branches for ZeroMQ's stable versions, many people reacted with shock and horror. Today, people have less emotional response. Tomorrow, I think it'll be clear that branches were, in fact, an entirely wrong approach inherited from the days of Subversion and monolithic repositories.
More profoundly, the branches vs. forks argument is really a wider design vs. evolve argument about how to make software optimally (both PC3 and C4 fully embrace the "evolve" approach). I may address that wider argument in a future article.
To make my argument here, I'll look at a number of criteria, and compare branches and forks in each one.
Complexity
The simpler, the better.
There is no inherent reason branches are more complex than forks. However, git-flow uses five types of branch, whereas PC3 uses two types of fork (development, and stable) and one branch (master). Circumstantial evidence is that branches lead to more complexity than forks. For naive users, it is definitely easier to learn to work with many repositories and no branches.
Learning Curve
The smoother the learning curve, the better.
Evidence definitely shows that learning to use git branches is complex. For some people this is OK. For most developers, every cycle spent learning git is a cycle lost on more productive things. I've been told several times, by different people, that I do not like branches because I "never properly learned git". That is fair but it is a criticism of the tool, not the human.
Cost of Failure
The lower the cost of failure, the better.
Branches demand more perfection from developers since mistakes potentially affect others. This raises the cost of failure. Forks make failure extremely cheap since nothing that happens in a fork can affect others not using that fork.
Upfront Coordination
The less need for upfront coordination, the better.
You can do a hostile fork. You cannot do a hostile branch. Branches depend on upfront coordination, which is expensive and fragile. One person can veto the desires of a whole group. In the ZeroMQ community for example we were unable to agree on a git branching model for a year. We solved that by using forking instead. The problem went away.
Scalability
The more you can scale a project, the better.
The strong assumption in all branch strategies is that the repository is the project. But there is a limit to how many people you can get in agreement to work together in one repository. As I explained, the cost of upfront coordination can become fatal. A more realistic project scales by allowing anyone to start their own repositories, and ensuring these can work together. A project like ZeroMQ has dozens of repositories. Forking looks more scalable than branching.
Surprise and Expectations
The less surprising, the better.
People expect branches and find forks to be uncommon and thus confusing. This is the one aspect where branches win. However, it's also a reason for sticking to FORTRAN and COBOL. We do not refuse innovation just because it's surprising.
Economics of Participation
The more tangible the rewards, the better.
A fully free process like PC3/C4 lets people organize around problems. Most organizations are not ready for such a radical management approach. But even a top-down approach needs people to feel rewarded for their work. Branches don't act like "product" but like "discrete variations of product". People have less interest in contributing to a discrete variation. Whereas everyone wants their name on a successful product. So the economics of branches are worse than the economics of forks.
Robustness in Conflict
The more a model can survive conflict, the better.
Like it or not, people fight over ego, status, belief. If your organizational model depends on agreement, you won't survive the first real fight. Branches do not survive real arguments and fights. Whereas forks can be hostile, and still benefit all parties. And this is indeed how free software works. Score one for forks, zero for braches.
Guarantees of Isolation
The stronger the isolation between production code and experiment, the better.
People make mistakes. I've seen experimental code pushed to mainline production by error. I've seen people make bad panic changes under stress. But the real fault is in allowing two entirely separate generations of product to exist in the same protected space. If you can push to random-branch-x you can push to master. Branches do not guarantee isolation of production critical code. Forks do.
Visibility
The more visible our work, the better.
Forks have watchers, issues, a README, a wiki. Branches have none of these. People try forks, build them, break them, patch them. Forks sit there until someone remembers to work on them. Forks have downloads and tarballs. Branches do not. When we look for self-organization, the more visible and declarative the problems, the faster and more accurately we can work.
Conclusions
Git branches are, in my experience and in shared repositories, harmful. It is better to work with a branch-free process that uses forks for stabilization. This comes from some years of trial and error on a wide range of projects. We have systematically found forks to be cheaper and safer and easier than branches. Branch-free processes like C4 and PC3 are real, and they work, in anger, both on closed source and open source projects. The only downside of a branch-free process seems to be that it shocks people with previous git experience. This is a passing effect, in our experience.
Comments
How much complexity do you really save?
With branching:
Without branching:
Save two trivial steps? Am I missing something?
You're describing a local branch, aren't you? As I said at the start of the post,
Furthermore, what are those two extra steps for? If they're not solving a real problem, then it's just dancing around for nothing.
Portfolio
No, I was talking about the workflow of first forking the project, then creating a new branch in your fork, pushing that branch to your public fork, and doing a pull request from that public branch. That seems to be the recommended workflow on most projects. Those two steps allow you to work in your own branch so you aren't messing with master. But probably what I describe above is still a "forking" workflow and what you recommend. I guess I have misread what you are suggesting.
Right, this is the forking workflow plus the extra (and IMO useless) private branch. It's now been… 750 days since I wrote this article and I've not needed to use branches once. Two things, above all else, appear when you eliminate branches:
I think I'm even more fanatical than before on this: even private branches are a Bad Thing because they break the model of "small atomic changes for small explicit problems" that we've learned works best for software development.
Portfolio
Here is an interesting (if slightly dramatic) post from Steve Bennett about the ways Git is needlessly difficult to learn and use. It turns out that by not using branches, and sticking to a fork + pull request model, we no longer need to learn much about Git's internal model.
Portfolio
I don't think I would say that branches are evil. I consider how people use branches evil. There needs to be a balance. I find branches work best when they are short lived. Forks work best for long lived "branches". In a sense, they are branches. If you don't use branches, it can make Git painful to work with when you have a number of people touching the same set of files. The temporary branches make the merges explicit (instead of trying to fast forward automatically).
I think you're right… short-lived branches can work, whereas long-lived ones are needlessly expensive. Your last sentence however blew my git fuse, it suggests that in order to collaborate I need to learn what "fast forward" vs "explicit merge" means, and that is a real problem when we try to use git for a wider audience.
Portfolio
Without trying to explain what those mean, I will point out something fairly important.
Pulling from a fork (or from *any* other repo) automatically creates an explicit merge.
Merging a branch *might* create an explicit merge or it might fast-forward. However, if you use the —no-ff flag, it will always create an explicit merge.
Therefore, pulling from a fork is functionally identical to merging from a branch with —no-ff.
There's a bit of a holy war on whether —no-ff should be the default behavior for merges.
So the holy war on —no-ff goes away when we stop using branches. Seems right.
Portfolio