On long term projects, it is common to see the Acquia git repository size to grow. Even without pushing large files, the addition of commits over the years can grow the repository size to a point which reduce the team or process efficiency. Various operations on the local, CI or Acquia environments require to switch branches or checkout the repository. With a large repository, these operations become slower and penalise and reduce the reactivity of the team.
Most of the steps are applicable whatever the CI/CD used on the project. However, some steps may only apply in the situation where the development team works on an external Git repository (like Github or Gitlab), uses BLT to generate the deployment artifact and push it to Acquia Git. More information about this workflow in the BLT documentation.
Let's walk through some steps which can be conducted to reduce the repository size and potentially avoid it to grow again.
Remove old tags
Many projects are using tags to manage releases on production. It is a good practice. However, if we don't pay attention to it, the number of tags become huge and can increase the repository size. For a team using BLT to generate the release tag from an external repository, it is even less important to keep an old tag history on Acquia Git given these can be regenerated.
Here is a bash script example which fetch the tags from the Acquia Git repository, keep the latest 5 ones and delete all the other ones. It must obviously be adapted to your specific needs, your tag's naming convention (here we assume 1.1.1-build format) and you repository name.
#!/bin/bash # The number of tags to preserve. nb_to_keep=5 # The Acquia Git repository url. repo="[email protected]" i=0 to_keep="" to_delete="" # Get the release tags and keep the last ones (using nb_to_keep). refs=$(git ls-remote -t --refs $repo | grep -o -E "refs/tags/.*$" | grep -o -E "refs/tags/[0-9]+\.[0-9]+\.[0-9]+-build$" | sort -r -t '/' -k 3 -V) for ref in $refs ; do tag=$(echo $ref | cut -d'/' -f3) if (( $i < $nb_to_keep )) ; then echo "Keeping $tag" if [ "$to_keep" = "" ] ; then to_keep="$tag" else to_keep+=" $tag" fi ((i++)) fi done # Get the all tags and compare against the to_keep list. refs=$(git ls-remote -t --refs $repo | grep -o -E "refs/tags/.*$" | sort -r -t '/' -k 3 -V) for ref in $refs ; do tag=$(echo $ref | cut -d'/' -f3) for keep_tag in $to_keep ; do if [ "$tag" = "$keep_tag" ] ; then break fi done if [ "$tag" != "$keep_tag" ] ; then echo "Marking $tag as to be deleted" if [ "$to_delete" = "" ] ; then to_delete="$tag" else to_delete+=" $tag" fi fi done # Delete the identified tags after confirmation. if [ ! "$to_delete" = "" ] ; then read -p "Do you confirm the deletion of '$to_delete' tags for $repo? " yn echo if [ "$yn" = y ] ; then for tag in $to_delete ; do echo "Deleting $tag" git push --delete $repo $tag done git remote prune $repo else echo "Nothing deleted" fi fi
Removing the old tags is definitely not the step which will reduce drastically the repository size but it is a mandatory step for the next ones to be efficient.
Remove useless branches
It is frequent to use a lower environment to deploy a feature branch for early demo, to create a hotfix branch and to deploy it to validate the fix before deployment to production. Many situation may lead to the creation and the deployment of a temporary branch. It is also very frequent to simply forgot about these branches once merged into the main stream. It is very unlikely these branches will be used again and for the teams building the artifact from the external repository, restoring these branches would be an easy operation with BLT deploy.
To avoid spending time to check the branch list on a regular basis, the best is to script the deletion so it can be used to automate the cleaning. Here is a bash script example which assume an existing script get-deployed-branches.sh to return the list of deployed branches, compare against the list of branches in the repository and delete the branches which are not deployed. The creation of this script is not detailed here as it may vary depending the context (Acquia Cloud, Acquia Cloud Site Factory, ...). Some default branches are also list to never be deleted to avoid mistakes.
#!/bin/bash repo="[email protected]" # Get the branches currently deployed. deployed_branches=$(./get-deployed-branches.sh) # Hardcode some branches for security. These branches are supposed to always be deployed at least on one env/stack. deployed_branches+=" develop-build qa-build uat-build" to_delete="" refs=$(git ls-remote -h $repo | grep -o -E "refs/heads/.*-build$") # Build the list of branches to be deleted. for ref in $refs ; do branch=$(echo $ref | cut -d'/' -f3) for deployed_branch in $deployed_branches ; do if [ "$branch" = "$deployed_branch" ] ; then break fi done if [ "$branch" = "$deployed_branch" ] ; then echo "Keeping $branch" else echo "Marking $branch as to be deleted" if [ "$to_delete" = "" ] ; then to_delete="$branch" else to_delete+=" $branch" fi fi done # Delete the identified branches after confirmation. if [ ! "$to_delete" = "" ] ; then read -p "Do you confirm the deletion of '$to_delete' branches on $repo? " yn echo if [ "$yn" = y ] ; then for branch in $to_delete ; do echo "Deleting $branch" git push $repo :refs/heads/$branch done git remote prune $repo else echo "Nothing deleted" fi fi
Reset branches history
For projects using an external repository, the commit history on the Acquia Git repository is not really important and has a huge impact on the size. The principle is to keep only the last commit on each branch.
Here are the git command used to reset a branch history:
git checkout <branch-name> git pull --all git checkout --orphan <branch-name>-tmp git add . git commit -m "Starting a fresh orphan branch fro <branch-name> git branch -D <branch-name> git branch -m <branch-name> git push -f origin <branch-name>
Ideally, this step should be done when BLT pushes the artifact during blt deploy. An enhancement ticket has been created on BLT repository to track this and it may become a simple option in the blt.yml file in the future. However, the git push -f operation is longer than a simple git push and can be impacting in some specific cases (pushing to multiple repositories in the case of multiple stacks on ACSF for example). For this reason it may be interesting to script the history reset of all the branches to be used in an automated process.
Here is an example of a bash script reseting the history of all the branches of the Acquia Git repository.
#!/bin/bash # The Acquia Git repository url. repo="[email protected]" # Clone the repository in a temporary directory. rm -Rf /tmp/tmp_clone git clone $repo /tmp/tmp_clone cd /tmp/tmp_clone || exit # Configure Git. git config user.name "Github-Actions-CI" git config user.email "[email protected]" git config checkout.defaultRemote origin git config advice.detachedHead false refs=$(git ls-remote -h $repo1 | grep -o -E "refs/heads/.*-build$") # Reset the history of each branch and push the result. for ref in $refs ; do ref_name=$(echo $ref | cut -d '/' -f3) git checkout $ref_name git checkout --orphan $ref_name-tmp git add . git commit -m "Starting a fresh orphan branch for $ref_name" --quiet git branch -D $ref_name git branch -m $ref_name git push -f origin done git remote prune origin