Reducing Acquia Git repository size

On long term projects, it is common to see the Acquia git repository size to grow. Even without pushing large files, the addition of commits over the years can grow the repository size to a point which reduce the team or process efficiency. Various operations on the local, CI or Acquia environments require to switch branches or checkout the repository. With a large repository, these operations become slower and penalise and reduce the reactivity of the team.

Most of the steps are applicable whatever the CI/CD used on the project. However, some steps may only apply in the situation where the development team works on an external Git repository (like Github or Gitlab), uses BLT to generate the deployment artifact and push it to Acquia Git. More information about this workflow in the BLT documentation.

Let's walk through some steps which can be conducted to reduce the repository size and potentially avoid it to grow again.

Remove old tags

Many projects are using tags to manage releases on production. It is a good practice. However, if we don't pay attention to it, the number of tags become huge and can increase the repository size. For a team using BLT to generate the release tag from an external repository, it is even less important to keep an old tag history on Acquia Git given these can be regenerated.

Here is a bash script example which fetch the tags from the Acquia Git repository, keep the latest 5 ones and delete all the other ones. It must obviously be adapted to your specific needs, your tag's naming convention (here we assume 1.1.1-build format) and you repository name.

#!/bin/bash

# The number of tags to preserve.
nb_to_keep=5

# The Acquia Git repository url.
repo="[email protected]"

i=0
to_keep=""
to_delete=""

# Get the release tags and keep the last ones (using nb_to_keep).
refs=$(git ls-remote -t --refs $repo | grep -o -E "refs/tags/.*$" | grep -o -E "refs/tags/[0-9]+\.[0-9]+\.[0-9]+-build$" | sort -r -t '/' -k 3 -V)
for ref in $refs ; do
  tag=$(echo $ref | cut -d'/' -f3)

  if (( $i < $nb_to_keep )) ; then
    echo "Keeping $tag"
    if [ "$to_keep" = "" ] ; then
      to_keep="$tag"
    else
      to_keep+=" $tag"
    fi
    ((i++))
  fi
done

# Get the all tags and compare against the to_keep list.
refs=$(git ls-remote -t --refs $repo | grep -o -E "refs/tags/.*$" | sort -r -t '/' -k 3 -V)
for ref in $refs ; do
  tag=$(echo $ref | cut -d'/' -f3)

  for keep_tag in $to_keep ; do
    if [ "$tag" = "$keep_tag" ] ; then
      break
    fi
  done

  if [ "$tag" != "$keep_tag" ] ; then
    echo "Marking $tag as to be deleted"
    if [ "$to_delete" = "" ] ; then
      to_delete="$tag"
    else
      to_delete+=" $tag"
    fi
  fi
done

# Delete the identified tags after confirmation.
if [ ! "$to_delete" = "" ] ; then
  read -p "Do you confirm the deletion of '$to_delete' tags for $repo? " yn
  echo
  if [ "$yn" = y ] ; then
    for tag in $to_delete ; do
      echo "Deleting $tag"
      git push --delete $repo $tag
    done

    git remote prune $repo
  else
    echo "Nothing deleted"
  fi
fi

Removing the old tags is definitely not the step which will reduce drastically the repository size but it is a mandatory step for the next ones to be efficient.

Remove useless branches

It is frequent to use a lower environment to deploy a feature branch for early demo, to create a hotfix branch and to deploy it to validate the fix before deployment to production. Many situation may lead to the creation and the deployment of a temporary branch. It is also very frequent to simply forgot about these branches once merged into the main stream. It is very unlikely these branches will be used again and for the teams building the artifact from the external repository, restoring these branches would be an easy operation with BLT deploy.

To avoid spending time to check the branch list on a regular basis, the best is to script the deletion so it can be used to automate the cleaning. Here is a bash script example which assume an existing script get-deployed-branches.sh to return the list of deployed branches, compare against the list of branches in the repository and delete the branches which are not deployed. The creation of this script is not detailed here as it may vary depending the context (Acquia Cloud, Acquia Cloud Site Factory, ...). Some default branches are also list to never be deleted to avoid mistakes.

#!/bin/bash

repo="[email protected]"

# Get the branches currently deployed.
deployed_branches=$(./get-deployed-branches.sh)

# Hardcode some branches for security. These branches are supposed to always be deployed at least on one env/stack.
deployed_branches+=" develop-build qa-build uat-build"

to_delete=""

refs=$(git ls-remote -h $repo | grep -o -E "refs/heads/.*-build$")

# Build the list of branches to be deleted.
for ref in $refs ; do
  branch=$(echo $ref | cut -d'/' -f3)

  for deployed_branch in $deployed_branches ; do
    if [ "$branch" = "$deployed_branch" ] ; then
      break
    fi
  done

  if [ "$branch" = "$deployed_branch" ] ; then
    echo "Keeping $branch"
  else
    echo "Marking $branch as to be deleted"
    if [ "$to_delete" = "" ] ; then
      to_delete="$branch"
    else
      to_delete+=" $branch"
    fi
  fi
done

# Delete the identified branches after confirmation.
if [ ! "$to_delete" = "" ] ; then
  read -p "Do you confirm the deletion of '$to_delete' branches on $repo? " yn
  echo
  if [ "$yn" = y ] ; then
    for branch in $to_delete ; do
      echo "Deleting $branch"
      git push $repo  :refs/heads/$branch
    done

    git remote prune $repo
  else
    echo "Nothing deleted"
  fi
fi

Reset branches history

For projects using an external repository, the commit history on the Acquia Git repository is not really important and has a huge impact on the size. The principle is to keep only the last commit on each branch.

Here are the git command used to reset a branch history:

git checkout <branch-name>
git pull --all
git checkout --orphan <branch-name>-tmp
git add .
git commit -m "Starting a fresh orphan branch fro <branch-name>
git branch -D <branch-name>
git branch -m <branch-name>
git push -f origin <branch-name>

Ideally, this step should be done when BLT pushes the artifact during blt deploy. An enhancement ticket has been created on BLT repository to track this and it may become a simple option in the blt.yml file in the future. However, the git push -f operation is longer than a simple git push and can be impacting in some specific cases (pushing to multiple repositories in the case of multiple stacks on ACSF for example). For this reason it may be interesting to script the history reset of all the branches to be used in an automated process.

Here is an example of a bash script reseting the history of all the branches of the Acquia Git repository.

#!/bin/bash

# The Acquia Git repository url.
repo="[email protected]"

# Clone the repository in a temporary directory.
rm -Rf /tmp/tmp_clone
git clone $repo /tmp/tmp_clone
cd /tmp/tmp_clone || exit

# Configure Git.
git config user.name "Github-Actions-CI"
git config user.email "[email protected]"
git config checkout.defaultRemote origin
git config advice.detachedHead false

refs=$(git ls-remote -h $repo1 | grep -o -E "refs/heads/.*-build$")

# Reset the history of each branch and push the result.
for ref in $refs ; do
  ref_name=$(echo $ref | cut -d '/' -f3)
  
  git checkout $ref_name
  git checkout --orphan $ref_name-tmp
  git add .
  git commit -m "Starting a fresh orphan branch for $ref_name" --quiet
  git branch -D $ref_name
  git branch -m $ref_name

  git push -f origin
done

git remote prune origin