Git: "no sync" branch management for contribution forks

May 15, 2021 git gitlab github

Not interested in explanations? Fast travel to the tl;dr 🚀

Open source communities on collaborative Git platforms (such as GitHub and GitLab) usually ask contributors to submit changes to the upstream repository from a fork.

Unlike an actual fork which branches off from upstream and has a life of its own, a contribution fork merely is a container for contribution branches: its lifecycle is tightly coupled with the upstream. Contributors are expected to submit changes based on a recent version of the upstream repository. Should the fork get outdated, they must sync with the upstream and rebase their work.

This is where things get murky. There are as many synchronization approaches as there are contributors. GitHub even has an entry in the documentation dedicated to syncing a fork.

For my part, I believe that a “no sync” approach is best 😄

Observation

Two branch types on a contribution fork:

Mirror branches: branches present both in the upstream and the fork, which must be kept in sync with upstream (usually, the default branch and release branches).
Contribution branches: branches present only in the fork, which contain the changes that will be submitted to the upstream.

In practice, mirror branches are useless: they only serve as starting point for contribution branches. The need for keeping them in sync with the upstream generates branch management overhead.

Could we contribute without needing to sync the fork at all? This is the basis behind the “no sync” approach.

Context

Setting up the `upstream` remote

This step is common with regular sync approaches.

By default, when we clone a repository, an origin remote pointing to the source repository is registered in the local repository. Here, I am cloning a personal fork of cilium/cilium:

$ git clone git@github.com:nbusseneau/cilium.git
Cloning into 'cilium'...
[...]
$ cd cilium
$ git remote -v
origin  git@github.com:nbusseneau/cilium.git (fetch)
origin  git@github.com:nbusseneau/cilium.git (push)

A default local branch is automatically checked out, and tracking the origin default branch:

$ git branch -vv
* master 1070b19ab [origin/master] Add missing Demo App reference

As it is a fork, we can switch to any mirror branch available:

$ git switch v1.10
Branch 'v1.10' set up to track remote branch 'v1.10' from 'origin'.
Switched to a new branch 'v1.10'
$ git branch -vv
  master 1070b19ab [origin/master] Add missing Demo App reference
* v1.10  75b4ed957 [origin/v1.10] build(deps): bump docker/setup-buildx-action

Note: in this post I will use git switch over git checkout, but you can use either.

Since a Git repository may interact with any number of remote Git repositories, let’s add the upstream repository as upstream remote:

$ git remote add upstream git@github.com:cilium/cilium.git
$ git remote -v
origin  git@github.com:nbusseneau/cilium.git (fetch)
origin  git@github.com:nbusseneau/cilium.git (push)
upstream        git@github.com:cilium/cilium.git (fetch)
upstream        git@github.com:cilium/cilium.git (push)

We can now fetch branches from the upstream:

$ git fetch upstream
[...]
From github.com:cilium/cilium
 * [new branch]          master                                     -> upstream/master
 * [new branch]          v1.9                                       -> upstream/v1.9
 * [new branch]          v1.10                                      -> upstream/v1.10
 [...]

Sync approaches

Most guides recommend to keep the mirror branches of a fork in sync with upstream using a pull-push pattern:

Pull (or merge, or rebase) a local mirror branch (e.g. master) from the upstream branch (e.g. upstream/master).
Push to the fork mirror branch (e.g. origin/master):

$ git switch master
$ git pull upstream master
$ git push

This last bit is precisely what we are not going to do: we are never going to sync mirror branches on the fork.

No sync approaches

We are going to present three no sync approaches:

Upstream-tracking branches
Fetch-only upstream branches
Upfetch

Upstream-tracking branches

This first approach is the most simple. We start with a repository as outlined above, with origin and upstream remotes setup. Let’s have a look at the .git/config file:

[branch "master"]
        remote = origin
        merge = refs/heads/master
[branch "v1.10"]
        remote = origin
        merge = refs/heads/v1.10

The local master and v1.10 branches are both tracking mirror branches on the fork (origin remote). We are going to bypass the fork and work directly with upstream.

Tracking upstream on existing branches

To have an existing branch track the upstream remote, we can either:

Manually edit .git/config and replace remote = origin by remote = upstream.
git config branch.<BRANCH_NAME>.remote upstream
git branch <BRANCH_NAME> -u upstream/<BRANCH_NAME>

$ git branch master -u upstream/master
Branch 'master' set up to track remote branch 'master' from 'upstream'.
$ git config branch.v1.10.remote upstream
$ git branch -vv
  master 1070b19ab [upstream/master: behind 104] Add missing Demo App reference
* v1.10  75b4ed957 [upstream/v1.10: behind 96] build(deps): bump docker/setup-buildx-action

[branch "master"]
        remote = upstream
        merge = refs/heads/master
[branch "v1.10"]
        remote = upstream
        merge = refs/heads/v1.10

Tracking upstream on new branches

To check out a new branch from the upstream repository and have it track upstream directly, we can use --track:

$ git switch -c v1.9 --track upstream/v1.9
Updating files: 100% (7131/7131), done.
Branch 'v1.9' set up to track remote branch 'v1.9' from 'upstream'.
Switched to a new branch 'v1.9'
$ git branch -vv
  master 1070b19ab [upstream/master: behind 104] Add missing Demo App reference
  v1.10  75b4ed957 [upstream/v1.10: behind 96] build(deps): bump docker/setup-buildx-action
* v1.9   f993696f9 [upstream/v1.9] Prepare for release v1.9.7

[branch "v1.9"]
        remote = upstream
        merge = refs/heads/v1.9

Note: --track is also available for git checkout -b, if you prefer it over git switch -c.

Branch management

Since we now have local upstream branches directly tracking the upstream repository, we can manage branches exactly like we would on a regular single-remote repository.

Updating upstream-tracking branches

Usual pull to retrieve latest changes from the upstream:

$ git switch master
Switched to branch 'master'
Your branch is behind 'upstream/master' by 104 commits, and can be fast-forwarded.
  (use "git pull" to update your local branch)
$ git pull
Updating 1070b19ab..dfc528bbe
Updating files: 100% (567/567), done.
Fast-forward
[...]

Creating new contribution branches

Usual pull-create pattern:

$ git switch master
Switched to branch 'master'
Your branch is behind 'upstream/master' by 104 commits, and can be fast-forwarded.
  (use "git pull" to update your local branch)
$ git pull
Updating 1070b19ab..dfc528bbe
Updating files: 100% (567/567), done.
Fast-forward
[...]
$ git switch -c pr/foo
Switched to a new branch 'pr/foo'
$ git branch -vv
  master 541214272 [upstream/master] install: Disable kube-proxy-replacement by default
* pr/foo 541214272 install: Disable kube-proxy-replacement by default

Rebasing existing contribution branches

Usual pull-rebase pattern:

$ git switch master
Switched to branch 'master'
Your branch is behind 'upstream/master' by 104 commits, and can be fast-forwarded.
  (use "git pull" to update your local branch)
$ git pull
Updating 1070b19ab..dfc528bbe
Updating files: 100% (567/567), done.
Fast-forward
[...]
$ git switch pr/foo
Switched to branch 'pr/foo'
$ git rebase master
Successfully rebased and updated refs/heads/pr/foo.

Pros

Compared to sync approaches, we have considerably reduced branch management overhead:

No manual operations to retrieve upstream changes.
No syncing mirror branches on the fork.
Local upstream-tracking branches can be used as if they were regular branches, exactly like we would on a regular single-remote repository.

Cons

If we can pull directly from the upstream, we can also push directly to the upstream. If we have write privileges, we could accidentally commit to them and push to upstream directly.

Fortunately, we can easily prevent that happening. Two solutions:

Blocking via Git config.
Blocking via Git hook.

Blocking via Git config

git config has an optional pushRemote variable for branches, which overrides the previously set remote variable for push operations. We can register a non-existing pushRemote to block push operations on specific branches:

$ git config branch.master.pushRemote DISABLE_PUSH
$ git push
fatal: 'DISABLE_PUSH' does not appear to be a git repository
fatal: Could not read from remote repository.

Please make sure you have the correct access rights
and the repository exists.

[branch "master"]
        remote = upstream
        merge = refs/heads/master
        pushRemote = DISABLE_PUSH

Blocking via Git hook

The pre-push hook exists precisely for this use case. We can create a .git/hooks/pre-push hook in the repository with a list of branches for which to block push operations:

#!/bin/bash

protected_branches=('master' 'v1.10' 'v1.9')

while read local_ref local_oid remote_ref remote_oid; do
  # Strip everything before last the '/'
  # e.g. refs/heads/master -> master
  current_branch=$(echo ${local_ref} | sed -e 's/.*\/\(.*\)/\1/')
  for protected_branch in "${protected_branches[@]}"; do
    if [ "${protected_branch}" = "${current_branch}" ]; then
      echo "push denied: ${protected_branch} is protected"
      exit 1
    fi
  done
done

$ git push
push denied: master is protected
error: failed to push some refs to 'github.com:nbusseneau/cilium.git'

Note: I personally block push operations via pushRemote. The pre-push hook in example above seems to be working fine, but I wrote it in a few minutes and did not extensively test it. Be careful! 😉

Fetch-only upstream branches

This second approach is a bit more complex. From a clean repository, we start by setting up origin and upstream remotes as above:

$ git clone git@github.com:nbusseneau/cilium.git
Cloning into 'cilium'...
[...]
$ cd cilium
$ git remote add upstream git@github.com:cilium/cilium.git
$ git remote -v
origin  git@github.com:nbusseneau/cilium.git (fetch)
origin  git@github.com:nbusseneau/cilium.git (push)
upstream        git@github.com:cilium/cilium.git (fetch)
upstream        git@github.com:cilium/cilium.git (push)

Branch management

We will directly use remote upstream branches via git fetch upstream <REMOTE_BRANCH>, and thus never have to manage any local upstream branches.

Creating new contribution branches

Fetch-create pattern.

First, we fetch the starting branch from upstream to ensure it is up to date – akin to a git pull before branching off.
Then, we create a new branch with upstream/<BRANCH_NAME> as starting point:

$ git fetch upstream master
[...]
From github.com:cilium/cilium
 * branch                master     -> FETCH_HEAD
   dfc528bbe..541214272  master     -> upstream/master
$ git switch -c pr/foo upstream/master --no-track
Switched to a new branch 'pr/foo'
$ git branch -vv
  master 1070b19ab [origin/master] Add missing Demo App reference
* pr/foo 541214272 install: Disable kube-proxy-replacement by default

Notice the use of --no-track when creating the branch: if not provided, --track upstream/master is assumed, which is not what we want.

Rebasing existing contribution branches

Fetch-rebase pattern.

First, we fetch the rebase base branch from upstream to ensure it is up to date – akin to a git pull before rebasing.
Then, we rebase on upstream/<BRANCH_NAME> rather than a local branch:

$ git fetch upstream master
[...]
From github.com:cilium/cilium
 * branch                master     -> FETCH_HEAD
   dfc528bbe..541214272  master     -> upstream/master
$ git rebase upstream/master
Successfully rebased and updated refs/heads/pr/foo.

Pros

This approach is extremely lean and minimal:

We have completely eliminated local upstream branches: the repository only contains contribution branches.
No branch switching when creating or rebasing.

Also does not require any safeguard against accidental pushes to upstream.

Cons

Branch management is non-standard. We probably will want to set up Git aliases, notably not to forget the --no-track flag.

Upfetch

This third approach is a compromise emerging from the other two. It works like the upstream-tracking branches approach, but we incorporate a variant of the fetch patterns from the fetch-only approach via git fetch upstream <REMOTE_BRANCH>:<LOCAL_BRANCH>:

$ git fetch upstream master:master
From github.com:cilium/cilium
   1070b19ab..541214272  master     -> master

This neat git fetch trick allows to fetch a remote branch and update a local branch in one go, without having to check it out first:

git branch -vv
  master 541214272 [upstream/master] install: Disable kube-proxy-replacement by default
* pr/foo 1070b19ab Add missing Demo App reference

I dubbed it the “upfetch”.

Branch management

The upfetch is very efficient compared to the previous patterns:

Regular pull-create / pull-rebase require extraneous switches, which can be annoying on huge repositories.
Fetch-create / fetch-rebase with git fetch upstream <REMOTE_BRANCH> do not update local branches.

Creating new contribution branches

Upfetch-create pattern:

$ git fetch upstream master:master
From github.com:cilium/cilium
   1070b19ab..541214272  master     -> master
$ git switch -c pr/foo
Switched to a new branch 'pr/foo'
$ git branch -vv
  master 541214272 [upstream/master] install: Disable kube-proxy-replacement by default
* pr/foo 541214272 install: Disable kube-proxy-replacement by default

Rebasing existing contribution branches

Upfetch-rebase pattern:

$ git fetch upstream master:master
From github.com:cilium/cilium
   1070b19ab..541214272  master     -> master
$ git rebase master
Successfully rebased and updated refs/heads/pr/foo.

Pros

Best of both worlds:

When working with upstream branches: intuitive branch management, as on a regular single-remote repository.
When working with contribution branches: minimal branch operations, no extraneous switches.

Cons

Same as upstream-tracking branches: need to prevent accidental pushes to the upstream.

tl;dr

In my opinion, syncing forks is fundamentally useless. A contribution fork merely is a container for contribution branches: non-contribution branches are only mirroring upstream, and syncing them is unnecessary since we can directly use the upstream.

In this post, we propose three “no sync” approaches:

Upstream-tracking local branches, which allow for managing branches exactly like we would on a regular single-remote repository.
Fetch-only upstream branches, which allow for minimal branch management – only contribution branches, nothing else.
The upfetch, a mix of the other two based on a neat git fetch trick.

In all cases, we never sync the fork, which reduces branch management overhead.

Previous Post Next Post

Git: "no sync" branch management for contribution forks

Observation

Context

Setting up the upstream remote

Sync approaches

No sync approaches

Upstream-tracking branches

Tracking upstream on existing branches

Tracking upstream on new branches

Branch management

Updating upstream-tracking branches

Creating new contribution branches

Rebasing existing contribution branches

Pros

Cons

Blocking via Git config

Blocking via Git hook

Fetch-only upstream branches

Branch management

Creating new contribution branches

Rebasing existing contribution branches

Pros

Cons

Upfetch

Branch management

Creating new contribution branches

Rebasing existing contribution branches

Pros

Cons

tl;dr

Setting up the `upstream` remote