Speed Up a Pipeline by Using a Git Shallow Clone

4 minute read Published: 2022-05-25

When you are working with a big repository that contains a lot of commits, the CI system can take a lot of time fetching the repo. And often, you don't need the whole history to run the test suits.

By default, Gitlab do a shallow clone, and fetch only the latest 50 commit, but this can be configured globally or per job by adding a GIT_DEPTH variables.

Often in a CI system, you want to have the build artifacts of your pipeline correctly tagged with an unique version number. The best way to do this is to use git describe to get a version composed with the last tag, the number of commit since the last tag and a short commit hash.

But by having only the latest 50 commits in the history, you can't be sure that git describe will find the last tag. When it first occurs, the simplest fix is to set GIT_DEPTH: 0 to tell Gitlab to do a full clone instead of a shallow one.

This, of course, lead to a longer pipeline duration. What if you have a project that have normally not more than 20 commits between two release in normal case, but still want to ensure that it work no matter how many commits you have since the last tag?

This can be done on Gitlab with this script:

variables:
  GIT_DEPTH: 20

.scripts:
  # This snippets ensure that at least one tag is present on the history of the current branch
  git-fetch-last-tag:
    - |
      echo "Fetching the latest tag..."
      # Ensure that we have all tag locally but without the history
      git fetch -q --depth=1 --tags
      # --tags ensure that it work for bot annoted and lightweight tags
      until git describe --tags; do
          echo "git: tag not found, fetching more commit..."
          git fetch -q --deepen=20 origin "${CI_COMMIT_BRANCH}"
      done
      echo "GIT_LAST_TAG: $(git describe --tags --abbrev=0)"

release_job:
  before_script:
    !reference [.script, git-fetch-last-tag]
  script:
    do_release

Another common pipeline job that don't work by default with a shallow clone is a job that check all the commits for a merge request. Maybe you want a job that check that your commits messages are correctly formatted using the conventional commit convention.

For this to work, you should be able to find the common ancestor between the current branch and the main branch of the project. Of course, Git has a command that return the SHA of this common ancestor: git merge-base HEAD main

To use it on the CI:

variables:
  GIT_DEPTH: 5

.scripts:
  # This snippets ensure that the common ancestor is in the history
  # and export the commit SHA as GIT_MERGE_BASE_SHA
  git-fetch-merge-base:
    # This script export GIT_MERGE_BASE_SHA and ensure it is fetched
    - |
      echo "Fetching the Git Merge Base commit..."
      # First, we fetch one commit from the main branch. 
      # This is needed as without it Git doesn't know that the branch exist
      git fetch -q --depth=1 origin "refs/heads/${CI_DEFAULT_BRANCH}:refs/heads/${CI_DEFAULT_BRANCH}"
      # Now, if you just rebased your branch on top of main and you have no more
      # than 4 commits, git merge-base will already work
      until git merge-base HEAD ${CI_DEFAULT_BRANCH}; do
        echo "git: merge-base not found, fetching more commit..."
        # But if not, we fetch 10 commits from both the main branch
        # and the current until we found the ancestor
        git fetch -q --deepen=10 origin "${CI_DEFAULT_BRANCH}" "${CI_COMMIT_BRANCH}";
      done

      GIT_MERGE_BASE_SHA=$(git merge-base HEAD ${CI_DEFAULT_BRANCH})
      echo "GIT_MERGE_BASE_SHA: ${GIT_MERGE_BASE_SHA}"
      export GIT_MERGE_BASE_SHA

check-commit:
  before_script:
    - !reference [.script, git-fetch-merge-base]
  script:
    - echo "The common ancestor SHA is: ${GIT_MERGE_BASE_SHA}"

With this two snippets, you can have all your CI job using a fast git shallow clone and then have only the jobs that need at least one tag or that need the GIT_MERGE_BASE_SHA variable to fetch more history as needed.