Git permanently deletes files and history

Posted by l_kris06 on Wed, 22 Dec 2021 09:29:59 +0100

Git permanently deletes files and history

You may want to permanently delete files and history from the git Repository:

  • You accidentally add a file that should not be added to version management, sensitive data or large files or other useless files;
  • You accidentally add an article related to cracking a famous software into GitHub warehouse. At this time, you will receive an email from GitHub official to remind you that you need to completely delete the file, otherwise it will be blocked by GitHub warehouse.
  • You want to permanently delete sensitive data or useless files from the version library without leaving a trace, not only invisible in the version history, but also free up the space it occupies.

Refer to the official link and the help document of github:
https://help.github.com/articles/remove-sensitive-data

It explains the steps in detail and provides an idea for using BFG tools (more convenient)

Here, I will only talk about the way to use git command. Take windows platform as an example, and linux has similar practices:

Using filter branch

Note: if you run git filter branch after storing the changes, you cannot retrieve the changes using other storage commands.

It is recommended to cancel any changes before running git filter branch. To unstash the last set of hidden changes, run git stash show -p | git apply -R. For more information, see https://git-scm.com/book/en/v1/Git-Tools-Stashing.

The demonstration is as follows:

  1. Enter the git repository and run the following command to replace PATH-TO-YOUR-FILE-WITH-SENSITIVE-DATA with the relative path of the file to be deleted (not just the file name).

The parameters of this command do the following:

  1. Force Git to process (but not check out) the full history of each branch and tag;
  2. Delete the specified file and any empty submissions generated therefrom;

Overwrite your existing tags

$ git filter-branch --force --index-filter \
'git rm --cached --ignore-unmatch PATH-TO-YOUR-FILE-WITH-SENSITIVE-DATA' \
--prune-empty --tag-name-filter cat -- --all

Among them, PATH-TO-YOUR-FILE-WITH-SENSITIVE-DATA is the relative path of the file you want to delete (relative to the root directory of git warehouse). Replace it with the file path you want to delete Note that if the file or folder here starts with '/', the file or folder will be considered to start from the git installation directory.

If the target you want to delete is not a file, but a folder, please add the - r command after the git rm --cached command to recursively delete (sub) folders and files under folders, similar to the rm -rf command.

If you want to delete a lot of files, you can write one sh files are executed in batches. If there is Chinese in the file or path, you can use the wildcard * sign, such as sound / music_ * mp3, so put the sound directory as music_ All mp3 files at the beginning have been deleted

For example, create a new bash script file, del music MP3 sh:

#!/bin/bash
git filter-branch --force --index-filter 'git rm --cached --ignore-unmatch projects/Moon.mp3' --prune-empty --tag-name-filter cat -- --all
git filter-branch --force --index-filter 'git rm --cached --ignore-unmatch sound/Music_*.mp3' --prune-empty --tag-name-filter cat -- --all

If you see something like this, the deletion is successful:

Rewrite 48dc599c80e20527ed902928085e7861e6b3cbe6 (266/266)

If xxxxx unchanged is displayed, it means that the file is not found in repo. Please check whether the path and file name are correct

Add to In the gitignore file and push the modified repo

If you want to never upload this file or folder again, please add this file or folder to gitignore file, and then push your repo

Add to gitignore file:

$ echo "YOUR-FILE-WITH-SENSITIVE-DATA" >> .gitignore
$ git add .gitignore
$ git commit -m "Add YOUR-FILE-WITH-SENSITIVE-DATA to .gitignore"
[master 051452f] Add YOUR-FILE-WITH-SENSITIVE-DATA to .gitignore
 1 files changed, 1 insertions(+), 0 deletions(-)

Check again if all you want to delete has been deleted from the repository's history and if all branches are checked out.

Push your repo by forced overwrite. The command is as follows:

git push origin --force --all
Counting objects: 1074, done.
Delta compression using 2 threads.
Compressing objects: 100% (677/677), done.
Writing objects: 100% (1058/1058), 148.85 KiB, done.
Total 1058 (delta 590), reused 602 (delta 378)
To https://github.com/YOUR-USERNAME/YOUR-REPOSITORY.git
 + 48dc599...051452f master -> master (forced update)

This process is actually to re upload our repo, which is time-consuming. Although it is similar to deleting and re creating a repo, the advantage is that the original update record is retained, so it is still somewhat different If you really don't care about these update records, you can delete them and rebuild them. There's not much difference between the two. Maybe the latter is more intuitive.

In order to delete the file or folder you specified from the tag ged version, you can use this command to forcibly push your Git tags:

$ git push origin master --force --tags

Tell your collaborators to recreate branches from your old (contaminated) repository history instead of merging them. A merge commit may reintroduce some or all of the contaminated history that you just got into trouble clearing.

Cleaning and reclaiming space
After a period of time, if you are sure that git filter branch has no unexpected side effects, you can use the following command to force dereference and garbage collection (GC) of all objects in the local repository.

$ git for-each-ref --format='delete %(refname)' refs/original | git update-ref --stdin
$ git reflog expire --expire=now --all
$ git gc --prune=now
Counting objects: 2437, done.
Delta compression using up to 4 threads.
Compressing objects: 100% (1378/1378), done.
Writing objects: 100% (2437/2437), done.
Total 2437 (delta 1461), reused 1802 (delta 1048)

You can also do this by pushing the filtered history into a new or empty repository and then creating a new clone from GitHub.

The first sentence of the above command can also be replaced by:

$ rm -rf .git/refs/original/

Reference from:

https://www.shuzhiduo.com/A/x9J2kXxn56

Topics: Windows git