Git Dehydration Edition [10. Internal Mechanism]

Posted by Drakkie on Sat, 15 Feb 2020 07:39:34 +0100

a.1 underlying command

Previously, we outlined 30 common Git commands, such as checkout/branch/remote. Since Git's initial goal was a collection of tools, not a VCS system, it contained a large number of underlying commands to be invoked in scripts like Unix systems, so these commands are called underlying commands, which are more user-friendly.Called wrapping commands. Previous descriptions focused on wrapping commands, but now more on the underlying commands, which make it easier for users to understand Git's internal mechanisms, and these commands are not suitable for running manually on the command line, but should be scripted to implement new functionality.

When the user executes git init in a file directory, Git creates a.Git directory and generates the template files needed by the Git warehouse in this default directory. The user can then clone the remote warehouse to the current local directory. Here is the initialized.Git directory, which contains the template files.

$ ls -F1
config
description
HEAD
hooks/
info/
objects/
refs/

The template files generated by different Git versions will be slightly different. The description s file can only be used with GitWeb tools. The config file contains project-related configurations. The info directory contains a global (valid for the entire project) file that contains filtering templates for negligible files (which cannot be placed into the.gitignore file), and the hooks directory contains hook scripts for both client and server.The rest belongs to the Git core. The objects directory holds all the contents of the database. The refs directory holds pointers to submissions of different references, such as branches, labels, remote warehouses. The HEAD file points to the current branch that the user has entered. The index file holds information about the staging area.

A.2 Git object

Git is essentially a file system that contains address information, which means that the Git core uses the simplest key-value pair to hold relevant data. In Git, a unique key is used to get the data needed.A unique key is returned to identify the data.

First the user needs to initialize a new Git repository and the objects directory is empty.

$ git init test
Initialized empty Git repository in /tmp/test/.git/
$ cd test
$ find .git/objects
.git/objects
.git/objects/info
.git/objects/pack
$ find .git/objects -type f

In the objects directory, Git generates the pack and info subdirectories, which are also empty. Using git hash-object, create a new data object and save it manually to the Git database.

$ echo 'test content' | git hash-object -w --stdin
d670460b4b4aece5915caf5c68d12f560a9fe3e4

The data is stored in the Git database with a unique key value (check value), -w option to write the data object to the Git database, --stdin option to read the data object from stdin, and the git hash-object command with a file name parameter to identify the data object. The information output from the command is a 40-bit check value to view the data object saved by Git.

$ find .git/objects -type f
.git/objects/d6/70460b4b4aece5915caf5c68d12f560a9fe3e4

The objects directory already contains a new file, and the first two bits of the check value are the subdirectory names. In this subdirectory, there is a new file named the last 38 bits of the check value.

$ git cat-file -p d670460b4b4aece5915caf5c68d12f560a9fe3e4
test content

Create two more data objects,

$ echo 'version 1' > test.txt
$ git hash-object -w test.txt
83baae61804e65cc73a7201a7252750c76066a30

$ echo 'version 2' > test.txt
$ git hash-object -w test.txt
1f7a7a472abf3dd9643fd615f6da379c4acb3e3a

View all the data objects stored in the object database.

$ find .git/objects -type f
.git/objects/1f/7a7a472abf3dd9643fd615f6da379c4acb3e3a
.git/objects/83/baae61804e65cc73a7201a7252750c76066a30
.git/objects/d6/70460b4b4aece5915caf5c68d12f560a9fe3e4

If you delete the text.txt file, you can use Git to restore different versions of text, as follows:

$ git cat-file -p 83baae61804e65cc73a7201a7252750c76066a30 > test.txt
$ cat test.txt
version 1

$ git cat-file -p 1f7a7a472abf3dd9643fd615f6da379c4acb3e3a > test.txt
$ cat test.txt
version 2

It should be noted that the check values for data objects are not convenient in practice, and Git does not save the file name of such objects, but the file contents. The type of such data objects, called blob s, can be viewed by using git cat-file-t.

$ git cat-file -t 1f7a7a472abf3dd9643fd615f6da379c4acb3e3a
blob

File Tree Object

File tree objects save file names and allow a set of files to be saved together, similar to the Unix file system. This is of course simpler. Data objects can be saved as file tree objects or blob objects, file trees for Unix directories, blobs for nodes or file contents, and each file tree object can contain one or more blobs or sub-file trees.The following is a file tree for a sample project.

$ git cat-file -p master^{tree}
100644 blob a906cb2a4a904a152e80877d4088654daad0c859      README
100644 blob 8f94139338f9404f26296befa88755fc2598c289      Rakefile
040000 tree 99f1a6d12cb4b6f19c8655fca46c3ecf317074e0      lib

master^{tree} represents the file tree object pointed to by the most recent Submission on the master branch, where lib is a subtree.

$ git cat-file -p 99f1a6d12cb4b6f19c8655fca46c3ecf317074e0
100644 blob 47c6340d6459e05787f644c2447d2595f5d3a54b      simplegit.rb

Note that you may encounter problems when using the master ^{tree} format because ^ is an escape character on the Windows command line. To avoid escaping, git cat-file-p master^{tree} can be used, and {} quotation marks must be added to avoid incorrect parsing of parameters, such as git cat-file-p'master^{tree}'. In ZSH tools, ^ is treated as a wildcard and quotation marks are also required.Enclosing parameters, such as git cat-file-p "master^{tree}".

The internal storage structure of Git is as follows:

At this point, the file tree object has been created. Generally, Git will generate a file tree object based on the state of the temporary area (index), because in order to create a file tree object, the user first needs to temporarily save some files, configure the index, use git update-index, this command can add files to the temporary area, and must include the -add option for file addition, --cacheinfo option, canAdd files saved in the database, not the workspace directory, and must be accompanied by, file type (100644), file checkcode, file name,

$ git update-index --add --cacheinfo 100644 \
  83baae61804e65cc73a7201a7252750c76066a30 test.txt

100644 represents a normal file, 100755 represents an executable file, 120000 represents a symbolic link, and refers to the file type of Unix system. File objects (blobs) can only use these three types, while file directories and sub-modules can choose other types.

With git write-tree, you can write a temporary area to a file tree object, and with the -w option, you can automatically create a file tree object if the file tree object does not exist.

$ git write-tree
d8329fc1cc938780ffdd9f94e0d364e0ea74f579
$ git cat-file -p d8329fc1cc938780ffdd9f94e0d364e0ea74f579
100644 blob 83baae61804e65cc73a7201a7252750c76066a30      test.txt

With git cat-file-t, you can view the type of file tree object.

$ git cat-file -t d8329fc1cc938780ffdd9f94e0d364e0ea74f579
tree

Create a new file tree object using a new version of the test.txt file and a new file.

$ echo 'new file' > new.txt
$ git update-index --add --cacheinfo 100644 \
  1f7a7a472abf3dd9643fd615f6da379c4acb3e3a test.txt
$ git update-index --add new.txt

The new version of test.txt and the new file new.txt are temporarily saved.

$ git write-tree
0155eb4229851634a0f03eb265b69f5a2d56f341
$ git cat-file -p 0155eb4229851634a0f03eb265b69f5a2d56f341
100644 blob fa49b077972391ad58037050f2a75f74e3671e92  new.txt
100644 blob 1f7a7a472abf3dd9643fd615f6da379c4acb3e3a  test.txt

Notice that the new version of the check code for test.txt has been modified to 1f7a7a, then you can add the previous file tree object as a subdirectory to the temporary area, use git read-tree to read the previous file tree into the temporary area, --prefix option, you can read the file tree object as a subtree of the current temporary.

$ git read-tree --prefix=bak d8329fc1cc938780ffdd9f94e0d364e0ea74f579
$ git write-tree
3c4e9cd789d88d8d89c1073707c3585e41b0e614
$ git cat-file -p 3c4e9cd789d88d8d89c1073707c3585e41b0e614
040000 tree d8329fc1cc938780ffdd9f94e0d364e0ea74f579  bak
100644 blob fa49b077972391ad58037050f2a75f74e3671e92  new.txt
100644 blob 1f7a7a472abf3dd9643fd615f6da379c4acb3e3a  test.txt

Based on the current staging area, users can create a workspace directory, a new version of new.txt and test.txt, and a subdirectory bak containing an older version of test.txt, which will be placed at the top level of the workspace directory with Git stored in the following structure:

Submitted Object

Different file trees in the staging area, sub-file trees (d8329f), initial file trees (0155eb), target file trees (3c4e9c), are used to describe different snapshots of the project. In the snapshots, there is also some information to be saved, such as who saved the snapshot, when it was saved, why it was saved, which is given by the submitting object.

To create a submission object, you can use commit-tree with a check code for the file tree object. Of course, the file tree object must be created first, using the first created file tree object (d8329f).

$ echo 'first commit' | git commit-tree d8329f
fdf4fc3344e67ab068f836878b6c4951e3b15f3d

With the submitted checkcode, you can view the creation time and author information contained in the submission.

$ git cat-file -p fdf4fc3
tree d8329fc1cc938780ffdd9f94e0d364e0ea74f579
author Scott Chacon <schacon@gmail.com> 1243040974 -0700
committer Scott Chacon <schacon@gmail.com> 1243040974 -0700

first commit

The format of the submission object is simple. It contains the top-level file tree of the current project snapshot, the author/submitter information (user.name and user.email configuration, and timestamp), a blank line, and the submission description. After that, two submission objects are fed, each pointing to the previous submission object, which is a chain table structure.

$ echo 'second commit' | git commit-tree 0155eb -p fdf4fc3
cac0cab538b970a37ea1e769cbbde608743bc96d
$ echo 'third commit' | git commit-tree 3c4e9c -p cac0cab
1a410efbd13591db07496601ebc7a059dd55cfe9

At the same time, each submission object can be placed into a different file tree object (the previous three file trees), in which case git log can be used to view Git's submission history, which comes with the latest submission.

$ git log --stat 1a410e
commit 1a410efbd13591db07496601ebc7a059dd55cfe9
Author: Scott Chacon <schacon@gmail.com>
Date:   Fri May 22 18:15:24 2009 -0700

    third commit

 bak/test.txt | 1 +
 1 file changed, 1 insertion(+)
 
commit cac0cab538b970a37ea1e769cbbde608743bc96d
Author: Scott Chacon <schacon@gmail.com>
Date:   Fri May 22 18:14:29 2009 -0700

    second commit

 new.txt | 1 +
 test.txt | 2 +-
 2 files changed, 2 insertions(+), 1 deletion(-)

commit fdf4fc3344e67ab068f836878b6c4951e3b15f3d
Author: Scott Chacon <schacon@gmail.com>
Date:   Fri May 22 18:09:34 2009 -0700

    first commit

 test.txt | 1 +
 1 file changed, 1 insertion(+)

The above operation uses only the bottom level commands to build a Git repository without using any front-end commands, git add or git commit. These front-end commands can implement functions using the bottom level commands, and contain three types of Git objects, blob/file tree/submit. The previous operations will be in the.git/objects directory, saving the corresponding files as follows:

$ find .git/objects -type f
.git/objects/01/55eb4229851634a0f03eb265b69f5a2d56f341 # tree 2
.git/objects/1a/410efbd13591db07496601ebc7a059dd55cfe9 # commit 3
.git/objects/1f/7a7a472abf3dd9643fd615f6da379c4acb3e3a # test.txt v2
.git/objects/3c/4e9cd789d88d8d89c1073707c3585e41b0e614 # tree 3
.git/objects/83/baae61804e65cc73a7201a7252750c76066a30 # test.txt v1
.git/objects/ca/c0cab538b970a37ea1e769cbbde608743bc96d # commit 2
.git/objects/d6/70460b4b4aece5915caf5c68d12f560a9fe3e4 # 'test content'
.git/objects/d8/329fc1cc938780ffdd9f94e0d364e0ea74f579 # tree 1
.git/objects/fa/49b077972391ad58037050f2a75f74e3671e92 # new.txt
.git/objects/fd/f4fc3344e67ab068f836878b6c4951e3b15f3d # commit 1

At the same time, these files can be viewed as a pointer, and a graphic can be used to describe these pointers.

Object Storage

As mentioned earlier, each object stored in a Git object database contains an object header that describes the object. The following discussion discusses how blob (file content) objects are saved.

First Git will create the object header, and the first part of the object header will contain,

A space + bytes of file content + terminator (\0)

The object header and file content are merged, and the SHA-1 check value of the file content is calculated. Git uses the zlib tool to compress and save the file content, and then writes it to the object database. All Git objects are saved in a similar way. Submitting objects and file tree objects also have an object header, but the contents saved by these two types of objects are inconsistent with blob objects.

A.3 Git reference

If a user needs to view the submission history of the warehouse based on a submission, git log 1a410e can be used, and 1a410e is a submitted check value and can view the submission history before the submission. To simplify the input of the command, use a simple submission name instead of a check code, which is called a reference (refs). In the.git/refs directory,A simple name corresponding to the check value can be found. In the initial project, there was no identity name in the directory, but only a simple directory structure.

$ find .git/refs
.git/refs
.git/refs/heads
.git/refs/tags
$ find .git/refs -type f

Create a new reference to help with the most recently submitted memory,

$ echo 1a410efbd13591db07496601ebc7a059dd55cfe9 > .git/refs/heads/master

Based on the identity name master, you can view the corresponding submission history.

$ git log --pretty=oneline master
1a410efbd13591db07496601ebc7a059dd55cfe9 third commit
cac0cab538b970a37ea1e769cbbde608743bc96d second commit
fdf4fc3344e67ab068f836878b6c4951e3b15f3d first commit

Git does not encourage users to edit reference identities directly, so it provides a more secure command, git update-ref, to update references.

$ git update-ref refs/heads/master 1a410efbd13591db07496601ebc7a059dd55cfe9

In fact, the master reference is the Git branch, so the user can create a new branch based on the second submission.

$ git update-ref refs/heads/test cac0ca

Users can view the submission history of the test branch directly.

$ git log --pretty=oneline test
cac0cab538b970a37ea1e769cbbde608743bc96d second commit
fdf4fc3344e67ab068f836878b6c4951e3b15f3d first commit

The overall structure of the Git database is as follows:

When a user runs the GIT branch <branch name>command, it is equivalent to executing the update-ref command, creating a new branch reference using the latest submitted checkcode for the current branch.

HEAD

The HEAD file is a symbol reference of the current branch. It is not a normal reference, so there is no check value, but a pointer to another reference to view the HEAD file.

$ cat .git/HEAD
ref: refs/heads/master

If you run git checkout test,Git will update the HEAD file immediately.

$ cat .git/HEAD
ref: refs/heads/test

If you run git cmmit, a new submission object will be generated, which will point to the previous submission object, and HEAD will also point to the latest submission object. Of course, users can edit the HEAD file directly, but it is recommended to use the more secure command git symbolic-ref, which reads the current state of the HEAD file.

$ git symbolic-ref HEAD
refs/heads/master

You can also set the current state of HEAD.

$ git symbolic-ref HEAD refs/heads/test
$ cat .git/HEAD
ref: refs/heads/test

But you can't set anything.

$ git symbolic-ref HEAD test
fatal: Refusing to point HEAD outside of refs/

Label

In addition to the previous three object types, there is a fourth object type, label, which is similar to the submission object and can include the labeler, date, description, and pointer. The difference is that labels can only point to submissions, not to file trees, and labels cannot be moved, and can always point to the same submission, but a more friendly name can be set.

As we have known in the previous discussion, you can configure two labels, simple label and note label, and create a simple label first.

$ git update-ref refs/tags/v1.0 cac0cab538b970a37ea1e769cbbde608743bc96d

Note labels are more complex, you first need to create a label object, then write a reference to the label object, instead of pointing directly to the submitting object.

$ git tag -a v1.1 1a410efbd13591db07496601ebc7a059dd55cfe9 -m 'test tag'

The check code for the note label has been generated, and you can view the note label based on the check code.

$ cat .git/refs/tags/v1.1
9585191f37f7b0fb9444f35a9bf50de191beadc2

$ git cat-file -p 9585191f37f7b0fb9444f35a9bf50de191beadc2
object 1a410efbd13591db07496601ebc7a059dd55cfe9
type commit
tag v1.1
tagger Scott Chacon <schacon@gmail.com> Sat May 23 16:48:58 2009 -0700

test tag

The object item gives the check code for the submission indicated. Note that this is not a real link relationship, but a submission object is marked. For example, in practice, after the project maintainer has added a GPG public key (stored as a blob object), a note label can be attached.

Reference to remote warehouse

The last type of reference, the application of remote warehouses, when a user adds a remote warehouse, updates to local branches can be pushed to the remote warehouse, that is, all branches under the refs/remotes directory, such as adding an origin remote warehouse and pushing the local master branch.

$ git remote add origin git@github.com:schacon/simplegit-progit.git
$ git push origin master
Counting objects: 11, done.
Compressing objects: 100% (5/5), done.
Writing objects: 100% (7/7), 716 bytes, done.
Total 7 (delta 2), reused 4 (delta 1)
To git@github.com:schacon/simplegit-progit.git
  a11bef0..ca82a6d master -> master

From the output information, it is known that the push is complete, and then the reference identity of the remote branch, refs/remotes/origin/master, can be checked.

$ cat .git/refs/remotes/origin/master
ca82a6dff817ec66f44342007202690a93763949

The biggest difference between a remote branch and a local branch is that, with the read-only property, the user can perform git checkout and switch to a remote branch, but Git cannot move the HEAD, that is, switch to a remote branch, and the user cannot use the commit command to generate submissions directly on the remote branch, so the meaning of a remote warehouse is to save the last known state of the current project.

a.4 Packaging Compression

Continuing with the previous example, the test repository contains 11 objects, 4 blob objects, 3 file tree objects, 3 submission objects, and 1 label object.

$ find .git/objects -type f
.git/objects/01/55eb4229851634a0f03eb265b69f5a2d56f341 # tree 2
.git/objects/1a/410efbd13591db07496601ebc7a059dd55cfe9 # commit 3
.git/objects/1f/7a7a472abf3dd9643fd615f6da379c4acb3e3a # test.txt v2
.git/objects/3c/4e9cd789d88d8d89c1073707c3585e41b0e614 # tree 3
.git/objects/83/baae61804e65cc73a7201a7252750c76066a30 # test.txt v1
.git/objects/95/85191f37f7b0fb9444f35a9bf50de191beadc2 # tag
.git/objects/ca/c0cab538b970a37ea1e769cbbde608743bc96d # commit 2
.git/objects/d6/70460b4b4aece5915caf5c68d12f560a9fe3e4 # 'test content'
.git/objects/d8/329fc1cc938780ffdd9f94e0d364e0ea74f579 # tree 1
.git/objects/fa/49b077972391ad58037050f2a75f74e3671e92 # new.txt
.git/objects/fd/f4fc3344e67ab068f836878b6c4951e3b15f3d # commit 1

Git will use the zlib tool to compress these files. After compression, all files will have only 925 bytes. Then, in the warehouse, a larger size will be added to test compression. First, a repo.rb file will be added, which is about 22k in size.

$ curl https://raw.githubusercontent.com/mojombo/grit/master/lib/grit/repo.rb > repo.rb
$ git checkout master
$ git add repo.rb
$ git commit -m 'added repo.rb'
[master 484a592] added repo.rb
 3 files changed, 709 insertions(+), 2 deletions(-)
 delete mode 100644 bak/test.txt
 create mode 100644 repo.rb
 rewrite test.txt (100%)

After viewing the file tree object, confirm the check code corresponding to the repo.rb file, and then use git cat-file to confirm the size of the repo.rb file.

$ git cat-file -p master^{tree}
100644 blob fa49b077972391ad58037050f2a75f74e3671e92    new.txt
100644 blob 033b4468fa6b2a9547a70d88d1bbe8bf3f9ed0d5    repo.rb
100644 blob e3f094f522629ae358806b17daf78246c27c007b    test.txt

$ git cat-file -s 033b4468fa6b2a9547a70d88d1bbe8bf3f9ed0d5
22044

Make minor changes to the repo.rb file and submit it it to see the results.

$ echo '# testing' >> repo.rb
$ git commit -am 'modified repo.rb a bit'
[master 2431da6] modified repo.rb a bit
 1 file changed, 1 insertion(+)

Check again for the latest submission of the file tree object,

$ git cat-file -p master^{tree}
100644 blob fa49b077972391ad58037050f2a75f74e3671e92    new.txt
100644 blob b042a60ef7dff760008df33cee372b945b6e884e    repo.rb
100644 blob e3f094f522629ae358806b17daf78246c27c007b    test.txt

You can see that the check code of the repo.rb file has changed, confirm the size of the file again.

$ git cat-file -s b042a60ef7dff760008df33cee372b945b6e884e
22054

There are two almost identical 22k files on the hard disk. If you save one of them and the other only saves the difference between two similar files, that's how Git handles it. Git saves objects in a loose (loose) format, and Git then packages them into a binary file called a packfile format.This is to save space and improve efficiency. If there are too many loose-formatted objects in the project, users can either run git gc manually or push it to a remote warehouse, using the git gc command first.

$ git gc
Counting objects: 18, done.
Delta compression using up to 8 threads.
Compressing objects: 100% (14/14), done.
Writing objects: 100% (18/18), done.
Total 18 (delta 3), reused 0 (delta 0)

Looking at the objects directory, you know that a new file has been generated.

$ find .git/objects -type f
.git/objects/bd/9dbf5aae1a3862dd1526723246b20206e5fc37
.git/objects/d6/70460b4b4aece5915caf5c68d12f560a9fe3e4
.git/objects/info/packs
.git/objects/pack/pack-978e03944f5c581011e6998cd0e9e30000905586.idx
.git/objects/pack/pack-978e03944f5c581011e6998cd0e9e30000905586.pack

From the output information, you can see that in the objects directory, only the contents of the files that were not submitted to are retained, that is, the first two lines, because the user did not add them to the submission, and the new files generated are. pack and.idx,.Pack files contain the previous files, which have been deleted,.idx contains the location (offset or index) where the previous files are stored in the.Pack files.At the same time, the size of the previous file was about 15k, and after packaging and compression it was about 7k.

The meaning of packaging compression is that it simplifies the command mode of the file while reducing the storage space of the file. Subsequent files with the same name only need to be saved differently from the previous version. Using the git verify-pack command, you can see the details of the package file.

$ git verify-pack -v .git/objects/pack/pack-978e03944f5c581011e6998cd0e9e30000905586.idx
2431da676938450a4d72e260db3bf7b0f587bbc1 commit  223  155 12
69bcdaff5328278ab1c0812ce0e07fa7d26a96d7 commit  214  152 167
80d02664cb23ed55b226516648c7ad5d0a3deb90 commit  214  145 319
43168a18b7613d1281e5560855a83eb8fde3d687 commit  213  146 464
092917823486a802e94d727c820a9024e14a1fc2 commit  214  146 610
702470739ce72005e2edff522fde85d52a65df9b commit  165  118 756
d368d0ac0678cbe6cce505be58126d3526706e54 tag     130  122 874
fe879577cb8cffcdf25441725141e310dd7d239b tree    136  136 996
d8329fc1cc938780ffdd9f94e0d364e0ea74f579 tree     36   46 1132
deef2e1b793907545e50a2ea2ddb5ba6c58c4506 tree    136  136 1178
d982c7cb2c2a972ee391a85da481fc1f9127a01d tree      6   17 1314 1 \
  deef2e1b793907545e50a2ea2ddb5ba6c58c4506
3c4e9cd789d88d8d89c1073707c3585e41b0e614 tree      8   19 1331 1 \
  deef2e1b793907545e50a2ea2ddb5ba6c58c4506
0155eb4229851634a0f03eb265b69f5a2d56f341 tree     71   76 1350
83baae61804e65cc73a7201a7252750c76066a30 blob     10   19 1426
fa49b077972391ad58037050f2a75f74e3671e92 blob      9   18 1445
b042a60ef7dff760008df33cee372b945b6e884e blob  22054 5799 1463
033b4468fa6b2a9547a70d88d1bbe8bf3f9ed0d5 blob      9   20 7262 1 \
  b042a60ef7dff760008df33cee372b945b6e884e
1f7a7a472abf3dd9643fd615f6da379c4acb3e3a blob     10   19 7282
non delta: 15 objects
chain length = 1: 3 objects
.git/objects/pack/pack-978e03944f5c581011e6998cd0e9e30000905586.pack: ok

In the above output, 033b4 data block is the latest version of the repo.rb file and needs to be associated with b042a data block (the initial version of repo.rb). The third item in the data block list, the size of the data block, is known to be 22K for b042a and 9 bytes for 033b4. Therefore, the latest version only contains differences from the original version, and can be repackaged at any time, either Git's automatic operation.It can also be a user's manual operation.

a.5 Remote Mapping

In the previous discussion, the user simply mapped the remote branch to the local branch, but the corresponding operation was quite complex, assuming the user needed to add a remote warehouse.

$ git remote add origin https://github.com/schacon/simplegit-progit

After the above command executes, the corresponding mapping configuration is added to the.git/config file of the local warehouse.

[remote "origin"]
    url = https://github.com/schacon/simplegit-progit
    fetch = +refs/heads/*:refs/remotes/origin/*

The format of the mapping configuration is +<src>: <dst>, SRC is the path of the remote branch, DST is the path of the local branch, +indicates that when pushing a remote branch, a forced push is not allowed if there is a conflict.

When running the git remote add origin command, Git writes all the data under refs/heads/to refs/remotes/origin/. If the remote warehouse has only one branch master, use the following three equivalent commands to access the run log of the master branch, all of which access the final path, refs/remotes/origin/master.

$ git log origin/master
$ git log remotes/origin/master
$ git log refs/remotes/origin/master

When the user only needs to get the remote branch master, the fetch parameter of the.git/config file can be modified.

fetch = +refs/heads/master:refs/remotes/origin/master

With git fetch, the default updates to the above branches can be achieved. If the user only needs a single update, the local and remote branches can be specified in the fetch command as follows:

$ git fetch origin master:refs/remotes/origin/mymaster

Of course, the fetch command also sets the mapping of multiple branches, as follows:

$ git fetch origin master:refs/remotes/origin/mymaster \
    topic:refs/remotes/origin/topic
From git@github.com:schacon/simplegit
! [rejected]   master -> origin/mymaster (non fast forward)
* [new branch]  topic -> origin/topic

In the above output, master and refs/remotes/origin/mymaster conflict, so the get operation is rejected.

Mapping of multiple branches can also be configured in the.git/config file, but wildcards are not allowed in specific branch names.

[remote "origin"]
    url = https://github.com/schacon/simplegit-progit
    fetch = +refs/heads/master:refs/remotes/origin/master
    fetch = +refs/heads/experiment:refs/remotes/origin/experiment

In a path (which can be viewed as a namespace), wildcards are allowed, and all branches under that path will be mapped as follows:

[remote "origin"]
    url = https://github.com/schacon/simplegit-progit
    fetch = +refs/heads/master:refs/remotes/origin/master
    fetch = +refs/heads/qa/*:refs/remotes/origin/qa/*

Push

If you need to push the local master branch to the remote branch qa/master, you can use the

$ git push origin master:refs/heads/qa/master

Of course, push operations can also be configured in the.git/config file. When running git push origin, the configuration file settings are used by default.

[remote "origin"]
    url = https://github.com/schacon/simplegit-progit
    fetch = +refs/heads/*:refs/remotes/origin/*
    push = refs/heads/master:refs/heads/qa/master

It should be noted that with mapping, it is not possible to get from one warehouse to another by default, and the remote setup of one step must be done manually by the user.

Delete reference

Using mapping relationships, you can also delete references to remote warehouses as follows:

$ git push origin :topic

Since the format of the mapping relationship is <src>: <dst>, if the <src>parameter is empty, the mapping relationship equivalent to the remote branch top is cancelled for the purpose of disguised deletion. Starting with Git 1.7.0, you can also use the following command to delete the remote branch.

$ git push origin --delete topic

a.6 Transport Protocol

Git can transfer data from a remote warehouse in two ways, one read-only and the other intelligent.

Read-only Protocol

If the remote warehouse only needs HTTP for read-only functionality, this is called the read-only protocol, and the server does not need to execute the Git tool during the data transfer process, because the get operation is a set of HTTP get requests, and the client must know the directory structure of the Git warehouse on the server beforehand. It is worth noting that the read-only protocol is rarely used because its security is difficult to guarantee and it cannot be configuredPrivate warehouses, so most Git hosts (cloud hosts and internal hosts) refuse to use such protocols.

Here's how to use the read-only protocol. First, clone a simple repository.

$ git clone http://server/simplegit-progit.git

The clone command first gets the info/refs file (which is generated by the update-server-info command), and the user needs to enable the post-receive hook script so that the HTTP transport can work properly.

=> GET info/refs
ca82a6dff817ec66f44342007202690a93763949  refs/heads/master

At this point the user will get a list of remote warehouse references (and corresponding SHA-1 checkcodes) and can look up HEAD references.

=> GET HEAD
ref: refs/heads/master

Based on the HEAD reference, it is known that the current main branch is master, and from the info/refs file, the latest submission is ca82a6 submission object, from which cloning will begin.

=> GET objects/ca/82a6dff817ec66f44342007202690a93763949
(179 bytes of binary data)

Based on a static HTTP get request, the latest submission is not found until it is a binary data, so it needs to be decompressed to see the details of the submission.

$ git cat-file -p ca82a6dff817ec66f44342007202690a93763949
tree cfda3bf379e4f8dba8717dee55aab78aef7f4daf
parent 085bb3bcb608e1e8451d4b2432f8ecbe6306e7e7
author Scott Chacon <schacon@gmail.com> 1205815931 -0700
committer Scott Chacon <schacon@gmail.com> 1240030591 -0700

    changed the version number

From the details of the latest submission, two more remote references were found, cfda3b is the file tree containing the latest submission, 085bb3 is the parent submission of the latest submission, and continues to get the parent submission.

=> GET objects/08/5bb3bcb608e1e8451d4b2432f8ecbe6306e7e7
(179 bytes of data)

Continue to get the file tree object,

=> GET objects/cf/da3bf379e4f8dba8717dee55aab78aef7f4daf
(404 - Not Found)

Get operation failed, 404 indicates no file tree object was found, either because the file tree object is placed in another standby repository or because the file tree object is a packaged file, get Git's list of standby repositories first.

=> GET objects/info/http-alternates
(empty file)

If the list is empty, you are not using the standby repository and check the package file again. That is, you get the objects/info/packs file, which contains the list of packaged files.

=> GET objects/info/packs
P pack-816a9b2334da9953e530f27bcac22082a9f5b835.pack

In the remote warehouse at this time, there is only one package file, get and check the corresponding idx file, whether there is a file tree object, if there are multiple package files, you need to check one by one.

=> GET objects/pack/pack-816a9b2334da9953e530f27bcac22082a9f5b835.idx
(4k of binary data)

In the idx file, find the file tree object, and the offset of the file tree object in the pack package file, get the pack package file,

=> GET objects/pack/pack-816a9b2334da9953e530f27bcac22082a9f5b835.pack
(13k of binary data)

Unzip the file tree object from the pack file until the local cloning of this master branch is complete.

Smart Protocol

Because the read-only protocol is inefficient and does not support writing operations to remote warehouses, the smart protocol is a more general data transfer scheme, but on the server, Git tools need to be executed to complete the corresponding functions, where two sets of data transfers, data upload and data download, will be implemented.

Data Upload

In order to upload data from a remote warehouse, send-pack operations need to be run on the client side and receive-pack operations on the server side.

SSH

Assuming that using the SSH protocol, when the client runs the git push origin master command, the send-pack operation will be triggered, and the SSH connection to the server will be initialized first, similar to the following commands,

$ ssh -x git@server "git-receive-pack 'simplegit-progit.git'"
00a5ca82a6dff817ec66f4437202690a93763949 refs/heads/master□report-status \
    delete-refs side-band-64k quiet ofs-delta \
    agent=git/2:2.1.1+github-607-gfba4028 delete-refs
0000

The git-receive-pack command responds immediately when a client requests to push each reference of the master branch, while the first line of the output information above lists the services that the server can provide, such as report-status,delete-refs, and so on, and at the beginning of each line of text in the output information will contain a four-digit hexadecimal number to identify the remaining transport size.For example, the first row has a starting value of 00a5 and a hexadecimal value of 165, which means that 165 bytes are left in the subsequent transfer data, and the second row has a value of 0000, which means that the reference object of the remote warehouse has been updated.

In the send-pack operation, it will also confirm whether the service side contains a specific submission. When updating each remote reference, send-pack will inform receive-pack of relevant information, such as the user needs to push the master branch and add an experiment ation branch. Send-pack will give the following information.

0076ca82a6dff817ec66f44342007202690a93763949  15027957951b64cf874c3557a0f3547bd83b3ff6
\ 
    refs/heads/master report-status
006c0000000000000000000000000000000000000000  cdfdb42577e2506715f8cfeacdbabc092bf63e8d
\ 
    refs/heads/experiment
0000

When a remote reference is updated, the old and new checkcodes are replaced, and each row of capital gives the size of the remaining transmitted data, while the full 0 check value indicates that this is a new remote reference. If a remote reference is deleted, the full 0 check value will be on the right.

The client then packages all the objects to be pushed (which do not exist on the server side), packages them into a compressed file, and the server responds to the data transfer, that is, the transfer succeeds (or fails).

000eunpack ok

HTTP(S)

HTTP-based data upload, similar to previous operations, requires the following connection initialization requests, except that the connection handshake is different.

=> GET http://server/simplegit-progit.git/info/refs?service=git-receive-pack
001f# service=git-receive-pack
00ab6c5f0e45abd7832bf23074a333f739977c9e8188 refs/heads/master□report-status \
    delete-refs side-band-64k quiet ofs-delta \
    agent=git/2:2.1.1~vmg-bitmaps-bugaloo-608-g116744e
0000

The above command completes the handshake between the client and the server, after which the client initiates another post request, and the data is provided by send-pack.

=> POST http://server/simplegit-progit.git/git-receive-pack

In the post request, the output information of the send-pack is included, along with the packaged file, and the server will then give an HTTP response, that is, the transfer succeeds or fails.

Data Download

When obtaining data, the package fetch-pack contains and upload-pack processing, and the client initializes a fetch-pack processing and docks the upload-pack processing on the server side.

SSH

If you get data based on the SSH protocol, fetch-pack will use the following commands,

$ ssh -x git@server "git-upload-pack 'simplegit-progit.git'"

When a fetch-pack is connected, upload-pack will affect the following information, which is quite different from previous uploads, except in different directions. When fetch-pack receives a response from the server, it can request the required remote reference again based on the check code contained in the response information. Upload-pack will package the data needed by the client into a compressed file and pass it to the client.

003cwant ca82a6dff817ec66f44342007202690a93763949 ofs-delta
0032have 085bb3bcb608e1e8451d4b2432f8ecbe6306e7e7
0009done
0000

HTTP(S)

In an acquisition operation based on the HTTP protocol, completing the handshake mechanism requires two HTTP requests, the first get request, to implement the read-only protocol as follows:

=> GET $GIT_URL/info/refs?service=git-upload-pack
001e# service=git-upload-pack
00e7ca82a6dff817ec66f44342007202690a93763949 HEAD□multi_ack thin-pack \
    side-band side-band-64k ofs-delta shallow no-progress include-tag \
    multi_ack_detailed no-done symref=HEAD:refs/heads/master \
    agent=git/2:2.1.1+github-607-gfba4028
003fca82a6dff817ec66f44342007202690a93763949 refs/heads/master
0000

The second HTTP post request executes the service-side git-upload-pack tool, which returns the data required by the client.

=> POST $GIT_URL/git-upload-pack HTTP/1.0
0032want 0a53e9ddeaddad63ad106860237bbf53411d11a7
0032have 441b40d833fdfa93eb2908e52742248faf0ee993
0000

The server also provides the status of the data request, the success or failure of the request, and the package file included when the request succeeds.

a.7 Maintenance and Data Recovery

Sometimes users need to complete some cleanup tasks, such as creating a more compact warehouse, cleaning up an imported warehouse, or restoring some lost work, which are discussed below.

Maintain

The gc command comes with the auto option to automate warehouse maintenance, but in most cases it is a fictitious command. If the warehouse contains too many loose objects (unpacked compression) or packed objects (packed compression), it can be maintained using the git gc command, which means garbage collection. This command can be completed by collecting loose objects for packaging compression and placing multiple packed objects in oneLarger pack object, delete objects that have not been submitted for access for months.

Users can call the GC command manually and configure gc.auto and gc.autopacklimit at the same time. When the number of loose objects and pack objects exceeds the configuration value, GC will process automatically. In addition, the GC command can also package warehouse references into a single file, assuming that the warehouse contains the following branches and labels.

$ find .git/refs -type f
.git/refs/heads/experiment
.git/refs/heads/master
.git/refs/tags/v1.0
.git/refs/tags/v1.1

If these files are no longer needed, run git gc and move them into the.git/packed-refs file.

$ cat .git/packed-refs
# pack-refs with: peeled fully-peeled
cac0cab538b970a37ea1e769cbbde608743bc96d refs/heads/experiment
ab1afef80fac8e34258ff41fc1b867c702daa24b refs/heads/master
cac0cab538b970a37ea1e769cbbde608743bc96d refs/tags/v1.0
9585191f37f7b0fb9444f35a9bf50de191beadc2 refs/tags/v1.1
^1a410efbd13591db07496601ebc7a059dd55cfe9

If the user updates the references above at this time, Git will not edit them, but write a new reference file in refs/heads. In order to get a correct check value, Git will get the check value of the same reference file (old version) in packed-refs and calculate the check value for the new reference file. At the last line of the above output, ^ indicates that the previous line is a note label, ^ is a reference afterSubmit the check value, which is the submission indicated by the note label on the previous line.

data recovery

If submissions are lost during management, such as forcing a working branch to be deleted or forcing the branch pointer to move, resulting in a loss of submissions, the following example is a mandatory reset of the master branch of the test warehouse. First, check the original submission of the master branch.

$ git log --pretty=oneline
ab1afef80fac8e34258ff41fc1b867c702daa24b modified repo a bit
484a59275031909e19aadb7c92262719cfcdf19a added repo.rb
1a410efbd13591db07496601ebc7a059dd55cfe9 third commit
cac0cab538b970a37ea1e769cbbde608743bc96d second commit
fdf4fc3344e67ab068f836878b6c4951e3b15f3d first commit

Force the branch pointer to move,

$ git reset --hard 1a410efbd13591db07496601ebc7a059dd55cfe9
HEAD is now at 1a410ef third commit

$ git log --pretty=oneline
1a410efbd13591db07496601ebc7a059dd55cfe9 third commit
cac0cab538b970a37ea1e769cbbde608743bc96d second commit
fdf4fc3344e67ab068f836878b6c4951e3b15f3d first commit

Comparing the output information shows that two submissions have been lost, how can you recover them? There is a chain table between the first submissions. Of course, all submissions can be restored based on the chain table, but this method is too inefficient. You can use the GIT reflog command at this time. Because Git backs up every change of the HEAD pointer, every submission or switch branch, the reflog is recorded, and git update-ref is used at the same time.Command, the user can add reflog records manually, so git reflog has recovery capabilities.

$ git reflog
1a410ef HEAD@{0}: reset: moving to 1a410ef
ab1afef HEAD@{1}: commit: modified repo.rb a bit
484a592 HEAD@{2}: commit: added repo.rb

From the output information, two missing submissions have occurred, and git log -g can also be used to view the missing submissions.

$ git log -g
commit 1a410efbd13591db07496601ebc7a059dd55cfe9
Reflog: HEAD@{0} (Scott Chacon <schacon@gmail.com>)
Reflog message: updating HEAD
Author: Scott Chacon <schacon@gmail.com>
Date:   Fri May 22 18:22:37 2009 -0700
    third commit

commit ab1afef80fac8e34258ff41fc1b867c702daa24b
Reflog: HEAD@{1} (Scott Chacon <schacon@gmail.com>)
Reflog message: updating HEAD
Author: Scott Chacon <schacon@gmail.com>
Date:   Fri May 22 18:15:24 2009 -0700
    modified repo.rb a bit

The next submission has been lost. Use the recover-branch subcommand to recover the lost submission.

$ git branch recover-branch ab1afef

$ git log --pretty=oneline recover-branch
ab1afef80fac8e34258ff41fc1b867c702daa24b modified repo a bit
484a59275031909e19aadb7c92262719cfcdf19a added repo.rb
1a410efbd13591db07496601ebc7a059dd55cfe9 third commit
cac0cab538b970a37ea1e769cbbde608743bc96d second commit
fdf4fc3344e67ab068f836878b6c4951e3b15f3d first commit

From the output information, it is known that the two lost submissions have been recovered, then the more difficult recovery is challenged, the reflog record is deleted completely, and the recover-branch previously recovered is deleted, returning to the original state of the two lost submissions.

$ git branch -D recover-branch
$ rm -Rf .git/logs/

Since reflog records are stored in the.git/logs/directory, only git fsck can be used to recover them. This command checks the integrity of the database and, with the full option, displays all independent objects that are not associated with other objects.

$ git fsck --full
Checking object directories: 100% (256/256), done.
Checking objects: 100% (18/18), done.
dangling blob d670460b4b4aece5915caf5c68d12f560a9fe3e4
dangling commit ab1afef80fac8e34258ff41fc1b867c702daa24b
dangling tree aea790b9a58f6cf6f2804eeac9f0abbe9631e4c9
dangling blob 7108f7ecb345ee9d0084193f147cdad4d2998293

You can see that the ab1afef submission has occurred, and since 484a592 submission is linked to the ab1afef submission, it will be restored together and can be restored again using the previous recover-branch command.

Remove Object

Some of Git's functions can also cause problems, such as git clone downloading all the history of the project repository. If the project only contains source code, Git can compress data efficiently. If a collaborator adds a single giant file to the project, then every clone must download the giant file. Even in subsequent submissions, removing the giant file will not make sense because it is packagedIncluded in the submission history, it will remain in the project repository. Similar issues arise when migrating SVN or Perforce repositories because users cannot obtain the complete history of these systems, and how do you handle redundant data that is not included in the submission history?

Note that the following methods can disrupt the submission history, modify all submissions associated with mega-files, and the user needs to notify all submitters that these new submissions have to be derived. This is a fairly complex task for collaborative development. To demonstrate this type of problem, add a mega-file in the test repository, delete it in subsequent submissions, and then remove it from the repositoryDelete the giant file as follows:

$ curl https://www.kernel.org/pub/software/scm/git/git-2.1.0.tar.gz > git.tgz
$ git add git.tgz
$ git commit -m 'add git tarball'
[master 7b30847] add git tarball
 1 file changed, 0 insertions(+), 0 deletions(-)
 create mode 100644 git.tgz

Delete the giant file git.tgz,

$ git rm git.tgz
rm 'git.tgz'
$ git commit -m 'oops - removed large tarball'
[master dadf725] oops - removed large tarball
 1 file changed, 0 insertions(+), 0 deletions(-)
 delete mode 100644 git.tgz

Use the gc command to view the hard disk space occupied by the database.

$ git gc
Counting objects: 17, done.
Delta compression using up to 8 threads.
Compressing objects: 100% (13/13), done.
Writing objects: 100% (17/17), done.
Total 17 (delta 1), reused 10 (delta 0)

Run count-objects again to see the hard disk space occupied by different reference objects in the database.

$ git count-objects -v
count: 7
size: 32
in-pack: 17
packs: 1
size-pack: 4868
prune-packable: 0
garbage: 0
size-garbage: 0

The size-pack gives the size of all the pack files, close to 5Mbyte (in kbyte). By comparison, the previously added giant files are also saved in the pack file. You can use the git verify-pack command to view the saved records of the giant files from all the Pack files as follows:

$ git verify-pack -v .git/objects/pack/pack-29...69.idx \
  | sort -k 3 -n \
  | tail -3
dadf7258d699da2c8d89b09ef6670edb7d5f91b4 commit 229 159 12
033b4468fa6b2a9547a70d88d1bbe8bf3f9ed0d5 blob   22044 5792 4977696
82c99a3e86bb1267b236a4b6eff7868d97489af1 blob   4975916 4976258 1438

The blob object at the bottom of the output information is 5Mbyte, and the rev-list command confirms the true details of the blob object.

$ git rev-list --objects --all | grep 82c99a3
82c99a3e86bb1267b236a4b6eff7868d97489af1 git.tgz

The user then needs to delete the information associated with the giant file in all the file trees, that is, to find submissions related to the giant file.

$ git log --oneline --branches -- git.tgz
dadf725 oops - removed large tarball
7b30847 add git tarball

Repair the above submission to completely remove the huge files from the repository.

$ git filter-branch --index-filter \
'git rm --ignore-unmatch --cached git.tgz' -- 7b30847^..
Rewrite 7b30847d080183a1ab7d18fb202473b3096e9f34 (1/2)rm 'git.tgz'
Rewrite dadf7258d699da2c8d89b09ef6670edb7d5f91b4 (2/2)
Ref 'refs/heads/master' was rewritten

The index-filter option, which only modifies the temporary area (index), git rm--cache deletes files from the temporary area instead of the hard disk. This method is faster. It simply restores each submission to the temporary area without restoring the entire version of the project. The ignore-unmatch option tells the GIT RM command that if the file to be deleted does not exist, there is no need to report errors, and finally specifies that starting with submission at 7b30847,Fix the entire submission history by excluding records associated with giant files.

At this point, the record of the giant file has been cleared from the submission history, and the reflog record needs to be cleared, and a new set of references needs to be added to the.git/refs/original.

$ rm -Rf .git/refs/original
$ rm -Rf .git/logs/
$ git gc
Counting objects: 15, done.
Delta compression using up to 8 threads.
Compressing objects: 100% (11/11), done.
Writing objects: 100% (15/15), done.
Total 15 (delta 1), reused 12 (delta 0)

a.8 environment variable

Git typically runs in a bash shell, so you can set some shell environment variables and configure default actions. The following are common environment variables for different functions.

Global Configuration

The following are the environment variables on which the Git system depends.

GIT_EXEC_PATH
Git finds the path of subcommands, such as git-commit,git-diff, and so on, and uses git --exec-path to examine the variable.

HOME
Git will use this variable to find the global configuration file. If the user chooses to unpack Git, he or she will need to modify this variable after completing the global configuration.

PREFIX
Similar to the operating system configuration, such as $PREFIX/etc/gitconfig, you can find Git's global configuration file.

GIT_CONFIG_NOSYSTEM
Setting this variable will invalidate the OS layer configuration for scenarios where the system layer configuration interferes with Git commands.

GIT_PAGER
Paging of the output information of the Git command can be controlled, and if this variable is not set, PAGER will be used.

GIT_EDITOR
When the user needs to edit the text (such as submitting a description), Git calls the variable, specifies the text editor, and if the variable is not set, EDITOR is used.

Warehouse Configuration

The interface configuration for the current warehouse is given below.

GIT_DIR
This is the path to save the.Git directory. If this variable is not set, Git will start at ~ (user root) or /(system root) to find the.Git directory.

GIT_CEILING_DIRECTORIES
You can control the lookup of.Git directories. If the speed of retrieving directories is too slow (such as tape drives, slow network directories), this variable can cause Git to discard directory retrieval that is too slow.

GIT_WORK_TREE
This is the path to store the root directory of the warehouse workspace. If the--git-dir command option or GIT_DIR variable is specified, and the--work-tree command option is not specified, and the GIT_WORK_TREE variable (the variable core.work tree of the configuration file), the current working directory will be treated as the top level of the current file tree.

GIT_INDEX_FILE
Save path to index file

GIT_OBJECT_DIRECTORY
Save path to.git/objects directory

GIT_ALTERNATE_OBJECT_DIRECTORIES
This is a colon-separated list in the format / dir/one:/dir/two:...Git can retrieve warehouse objects that are not included in the GIT_OBJECT_DIRECTORY directory in the path specified by this variable. If a project contains multiple large files with consistent content, using this variable can prevent this from happening.

Path Configuration

Used to specify the path used by Git tools, which can contain wildcards,

GIT_GLOB_PATHSPECS and GIT_NOGLOB_PATHSPECS
You can control the default usage of wildcards in the path configuration. If GIT_GLOB_PATHSPECS is set to 1, wildcards can be used. If GIT_NOGLOB_PATHSPECS is set to 1, only a single file, such as *.c, can be matched. A file named *.c can only match, not any file ending in.C, and such variables can also limit the case of the path start character, such as: (glob)*.c

GIT_LITERAL_PATHSPECS
Invalidates the configuration of GIT_GLOB_PATHSPECS and GIT_NOGLOB_PATHSPECS, does not use wildcards, and restrictions on path prefixes.

GIT_ICASE_PATHSPECS
Ignore the case of characters in the path.

Submit Configuration

Git typically uses git-commit-tree to generate the final submission object, which requires some support for environment variables, as follows:

GIT_AUTHOR_NAME
Gives information about the author item, the author name

GIT_AUTHOR_EMAIL
Gives information about the author item, the author's mailbox address

GIT_AUTHOR_DATE
Gives information about the author item, which is the date the modification was submitted

GIT_COMMITTER_NAME
Gives information about the committer item, the submitter's name

GIT_COMMITTER_EMAIL
Gives information about the committer item, which is the submitter's mailbox address

GIT_COMMITTER_DATE
Gives information about the committer item, which is the submission date

EMAIL
If the user.email configuration variable is not set, it will be used, and if it is not set, the submission will use the system account name and host name

network configuration

Git uses the curl library to communicate with Http over the network, and sets the GIT_CURL_VERBOSE variable to pass all messages generated by the curl library to Git, such as the output information of curl-v.

GIT_SSL_NO_VERIFY
Discard the validation of the SSL certificate, sometimes the variable must be set, such as when the user uses self-signed, an open signature, or when the user does not install a full certificate during the configuration of the Git server.

GIT_HTTP_LOW_SPEED_LIMIT and GIT_HTTP_LOW_SPEED_TIME
If the Http transfer rate is lower than the GIT_HTTP_LOW_SPEED_LIMIT limit or the transfer time exceeds the GIT_HTTP_LOW_SPEED_TIME limit, Git can abort the current operation and these variables can override the configuration variables http.lowSpeedLimit and http.lowSpeedTime.

GIT_HTTP_USER_AGENT
This variable can be used for Http transport, set network proxy, default git/2.0.0

Differences and Merges

GIT_DIFF_OPTS
The git diff command can be controlled, and the output line of text is the same as the -u or--unified=command option.

GIT_EXTERNAL_DIFF
You can override the configuration variable diff.external, which, if set, will invoke the external comparison tool specified by the variable when running git diff.

GIT_DIFF_PATH_COUNTER and GIT_DIFF_PATH_TOTAL
In the external alignment tool specified by GIT_EXTERNAL_DIFF or diff.external, a path parameter can be attached, and the path is not merged. GIT_DIFF_PATH_COUNTER can specify a counter with an accumulative value of 1. After each path is compared, the counter is added 1, and GIT_DIFF_PATH_TOTAL can limit the number of alignment paths.

GIT_MERGE_VERBOSITY
The output information of a recursive merge can be controlled by setting the following values, which are 2 by default.

0, no output unless error message occurs
1, show conflict information only
2, Display changes to files
3, show unchanged (skipped) files
4, show all processed paths
5, display the above information, and debug information

Debugging Configuration

Git provides a complete tracking operation that turns on debugging when the user needs it. The following variables can be used for configuration values.

true/1/2, trace information written to stderr
Absolute paths starting with/to write trace information to specific files

GIT_TRACE
You can control common tracing functions, regardless of the type of tracing, including kana extensions, calls to subcommands,

$ GIT_TRACE=true git lga
20:12:49.877982 git.c:554         trace: exec: 'git-lga'
20:12:49.878369 run-command.c:341 trace: run_command: 'git-lga'
20:12:49.879529 git.c:282         trace: alias expansion: lga => 'log' '--graph' '--pretty=oneline' '--abbrev-commit' '--decorate' '--all'
20:12:49.879885 git.c:349         trace: built-in: git 'log' '--graph' '--pretty=oneline' '--abbrev-commit' '--decorate' '--all'
20:12:49.899217 run-command.c:341 trace: run_command: 'less'
20:12:49.899675 run-command.c:192 trace: exec: 'less'

GIT_TRACE_PACK_ACCESS
The tracking of Pack files can be controlled. In the output information, the first is the currently accessed pack file, and the second is the file offset within the pack file.

$ GIT_TRACE_PACK_ACCESS=true git status
20:10:12.081397 sha1_file.c:2088 .git/objects/pack/pack-c3fa...291e.pack 12
20:10:12.081886 sha1_file.c:2088 .git/objects/pack/pack-c3fa...291e.pack 34662
20:10:12.082115 sha1_file.c:2088 .git/objects/pack/pack-c3fa...291e.pack 35175    
# [...]
20:10:12.087398 sha1_file.c:2088 .git/objects/pack/pack-e80e...e3d2.pack 56914983
20:10:12.087419 sha1_file.c:2088 .git/objects/pack/pack-e80e...e3d2.pack 14303666
On branch master
Your branch is up-to-date with 'origin/master'.
nothing to commit, working directory clean

GIT_TRACE_PACKET
Enables tracing of network packets,

$ GIT_TRACE_PACKET=true git ls-remote origin
20:15:14.867043 pkt-line.c:46 packet: git< # service=git-upload-pack
20:15:14.867071 pkt-line.c:46 packet: git< 0000
20:15:14.867079 pkt-line.c:46 packet: git< 97b8860c071898d9e162678ea1035a8ced2f8b1f HEAD\0multi_ack thin-pack side-band side-band-64k ofs-delta shallow no-progress include-tag multi_ack_detailed no-done symref=HEAD:refs/heads/master agent=git/2.0.4
20:15:14.867088 pkt-line.c:46 packet: git< 0f20ae29889d61f2e93ae00fd34f1cdb53285702 refs/heads/ab/add-interactive-show-diff-func-name
20:15:14.867094 pkt-line.c:46 packet: git< 36dc827bc9d17f80ed4f326de21247a5d1341fbc refs/heads/ah/doc-gitk-config  
# [...]

GIT_TRACE_PERFORMANCE
You can create a log of performance data that will contain the execution time of each git command.

$ GIT_TRACE_PERFORMANCE=true git gc
20:18:19.499676 trace.c:414 performance: 0.374835000 s: git command: 'git' 'pack-refs' '--all' '--prune'
20:18:19.845585 trace.c:414 performance: 0.343020000 s: git command: 'git' 'reflog' 'expire' '--all'
Counting objects: 170994, done.
Delta compression using up to 8 threads.
Compressing objects: 100% (43413/43413), done.
Writing objects: 100% (170994/170994), done.
Total 170994 (delta 126176), reused 170524 (delta 125706)
20:18:23.567927 trace.c:414 performance: 3.715349000 s: git command: 'git' 'pack-objects' '--keep-true-parents' '--honor-pack-keep' '--non-empty' '--all' '--reflog' '--unpack-unreachable=2.weeks.ago' '--local' '--delta-base-offset' '.git/objects/pack/.tmp-49190-pack'
20:18:23.584728 trace.c:414 performance: 0.000910000 s: git command: 'git' 'prune-packed'
20:18:23.605218 trace.c:414 performance: 0.017972000 s: git command: 'git' 'update-server-info'
20:18:23.606342 trace.c:414 performance: 3.756312000 s: git command: 'git' 'repack' '-d' '-l' '-A' '--unpack-unreachable=2.weeks.ago'
Checking connectivity: 170994, done.
20:18:25.225424 trace.c:414 performance: 1.616423000 s: git command: 'git' 'prune' '--expire' '2.weeks.ago'
20:18:25.232403 trace.c:414 performance: 0.001051000 s: git command: 'git' 'rerere' 'gc'
20:18:25.233159 trace.c:414 performance: 6.112217000 s: git command: 'git' 'gc'

GIT_TRACE_SETUP
Displays the information Git retrieves from the warehouse and server environments.

$ GIT_TRACE_SETUP=true git status 
20:19:47.086765 trace.c:315 setup: git_dir: .git
20:19:47.087184 trace.c:316 setup: worktree: /Users/ben/src/git
20:19:47.087191 trace.c:317 setup: cwd: /Users/ben/src/git
20:19:47.087194 trace.c:318 setup: prefix: (null)
On branch master
Your branch is up-to-date with 'origin/master'.
nothing to commit, working directory clean

Miscellaneous Configuration

GIT_SSH
This variable can set an SSH application tool. When Git connects to the SSH host, the SSH command will not be used, but the application tool specified by the variable, such as $GIT_SSH [username@]host [-p <port>] <command>, will be used. Note that this is not an easy way to encapsulate the SSH custom command, it does not support the extension of command line parameters, or the user configures GIT_SSH in a script and then customizes the command, of course.Custom commands for SSH can be encapsulated in the ~/.ssh/config file.

GIT_ASKPASS
The configuration variable core.askpass can be overridden. When Git asks the user for a certificate, it calls the application tool specified by the variable, and text can be passed to the application tool, while the output information of the application tool can be written to stdout.

GIT_NAMESPACE
Access to the namespace that controls the warehouse application is the same as the namespace option, which is typically used on the server side. When administrators need to create multiple copies of a single warehouse, they can choose to save only the warehouse references of multiple copies independently.

GIT_FLUSH
When incrementally written to stdout, this variable forces Git to use an unplugged IO operation. If the configuration value is 1, it triggers Git to periodically empty stdout. If the configuration value is 0,Git will not empty stdout. If the variable is not set, the emptying operation will depend on the current Git command and the current mode of stdout.

GIT_REFLOG_ACTION
You can specify the description text for the reflog as follows:

$ GIT_REFLOG_ACTION="my action" git commit --allow-empty -m 'my message'
[master 9e3d55a] my message
$ git reflog -1
9e3d55a HEAD@{0}: my action: my message

Ordering too much

80 original articles published. 10% praised. 190,000 visits+

Private letter follow

Topics: git ssh Database github

Programmer Think

Git Dehydration Edition [10. Internal Mechanism]

a.1 underlying command

A.2 Git object

File Tree Object

Submitted Object

Object Storage

A.3 Git reference

HEAD

Label

Reference to remote warehouse

a.4 Packaging Compression

a.5 Remote Mapping

Push

Delete reference

a.6 Transport Protocol

Read-only Protocol

Smart Protocol

Data Upload

SSH

HTTP(S)

Data Download

SSH

HTTP(S)

a.7 Maintenance and Data Recovery

Maintain

data recovery

Remove Object

a.8 environment variable

Global Configuration

Warehouse Configuration

Path Configuration

Submit Configuration

network configuration

Differences and Merges

Debugging Configuration

Miscellaneous Configuration

Hot Topics