[learn and forget] git principle - 16. Git object [Tree object]

Posted by tdeez173 on Fri, 18 Feb 2022 01:29:23 +0100

1. Tree object introduction

The Git object type to be discussed next is tree object, which can solve the problem of saving file names. Tree objects can store file names and also allow us to organize multiple files together.

Git stores content in a way similar to the UNIX file system, but with some simplification. All contents are stored in the form of tree objects and data blob objects, in which the tree object corresponds to the directory item in UNIX, and the data object blob roughly corresponds to the contents in the file.

A tree object can contain one or more records (tree object and blob object). Each record contains a SHA-1 pointer to blob object or sub tree object, as well as the corresponding mode, type and file name information.

As shown below:

# File mode, object type, SHA-1 pointer of object, file name
100644 blob fa49b077972391ad58037050f2a75f74e3671e92    new.txt

The storage method of Tree object is shown in the following figure:

2. Tree object description

(1) Initialize a new local version Library

L@DESKTOP-T2AI2SU MINGW64 /j/git-repository/git_learning
$ git init
Initialized empty Git repository in J:/git-repository/git_learning/.git/

L@DESKTOP-T2AI2SU MINGW64 /j/git-repository/git_learning (master)
$ ll -a
total 8
drwxr-xr-x 1 L 197121 0  4 November 14:50 ./
drwxr-xr-x 1 L 197121 0  4 October 20:23 ../
drwxr-xr-x 1 L 197121 0  4 November 14:50 .git/

(2) Create a tree object (focus)

1) Create a new file, and then submit the file to the local version library

For example: create a new file test Txt, file content version 1

# create a file
L@DESKTOP-T2AI2SU MINGW64 /j/git-repository/git_learning (master)
$ echo "version 1" >> test.txt

L@DESKTOP-T2AI2SU MINGW64 /j/git-repository/git_learning (master)
$ ll
total 1
-rw-r--r-- 1 L 197121 10  4 November 14:57 test.txt

# view file contents
L@DESKTOP-T2AI2SU MINGW64 /j/git-repository/git_learning (master)
$ cat test.txt
version 1

2) Put test Txt file and submit it to the local version library

# 1.test.txt file is submitted to the local version library
L@DESKTOP-T2AI2SU MINGW64 /j/git-repository/git_learning (master)
$ git hash-object -w ./test.txt
83baae61804e65cc73a7201a7252750c76066a30

# 2. Check the content of Git database and you can see that a blob object is added
L@DESKTOP-T2AI2SU MINGW64 /j/git-repository/git_learning (master)
$ find .git/objects -type f
.git/objects/83/baae61804e65cc73a7201a7252750c76066a30

L@DESKTOP-T2AI2SU MINGW64 /j/git-repository/git_learning (master)
$ git cat-file -t 83baae61804e65cc73a7201a7252750c76066a30
blob

# 3. View the content of blob object
L@DESKTOP-T2AI2SU MINGW64 /j/git-repository/git_learning (master)
$ git cat-file -p 83baae61804e65cc73a7201a7252750c76066a30
version 1

The above is the same as the operation of blob object.

At this point, test Txt files are managed in Git local version library.

3) Create a tree object

Usually, Git creates tree objects based on the temporary storage area or index file index. Therefore, it is necessary to store the file in the temporary storage area and create the index file.

Tip 1:

The index file is in In the Git directory, the newly initialized Git local warehouse does not have an index file. It will not be created until the data is added to the temporary storage area once The index file is automatically generated in the Git directory.

Newly initialized The contents of git directory are as follows: there is no index file.

L@DESKTOP-T2AI2SU MINGW64 /j/git-repository/git_learning (master)
$ ll .git/
total 7
-rw-r--r-- 1 L 197121 130  4 November 14:50 config
-rw-r--r-- 1 L 197121  73  4 November 14:50 description
-rw-r--r-- 1 L 197121  23  4 November 14:50 HEAD
drwxr-xr-x 1 L 197121   0  4 November 14:50 hooks/
drwxr-xr-x 1 L 197121   0  4 November 14:50 info/
drwxr-xr-x 1 L 197121   0  4 November 14:59 objects/
drwxr-xr-x 1 L 197121   0  4 November 14:50 refs/

Tip 2:

You can view the information of the temporary files in the files git command area.

Parameter information is as follows, abbreviated in parentheses:

  • --cached(-c): view files in the staging area. The GIT LS files command executes this option by default.
  • --Mixed (- M): view the modified file.
  • --delete(-d): View deleted files.
  • --other(-o): view files that are not tracked by Git.
  • --stage(-s): displays the mode and the Blob object corresponding to the file, and then we can get the contents of the corresponding file in the temporary storage area.

For example: git LS files - C or git LS files -- cached (the same for other commands)

We often use the GIT LS files - s command to view the file information in the temporary storage area.

Next, we can easily create our own tree objects through the underlying commands: update index, write tree, read tree and so on.

# 1. Check the current status of the staging area and you can see that there is no display
# Indicates that the staging area does not store any files
L@DESKTOP-T2AI2SU MINGW64 /j/git-repository/git_learning (master)
$ git ls-files -s

# 2. Put test Txt files are stored in the temporary storage area
# Through git update index command
L@DESKTOP-T2AI2SU MINGW64 /j/git-repository/git_learning (master)
$ git update-index --add --cacheinfo 100644 \
> 83baae61804e65cc73a7201a7252750c76066a30 test.txt

# 3. Check the current status of the staging area again, and you can see that there is a file in the staging area.
L@DESKTOP-T2AI2SU MINGW64 /j/git-repository/git_learning (master)
$ git ls-files -s
100644 83baae61804e65cc73a7201a7252750c76066a30 0       test.txt
# Here's why we put test Txt files are stored in the local version library first,
# Because you need the hash key of the file to add it to the staging area.
### The combination of file name and file hash key is also explained here. (key points)

Command description:

  • First, you need to create a staging tree for some objects.
    A separate file is stored in the temporary storage area through the underlying command git update index.
  • --Add option: because the file was not in the staging area before, when a file is added to the staging area for the first time, you need to use the -- add option.
  • --Cacheinfo option: because the test The txt file is located in the Git database (the operation in the previous step), not in the current working directory, so the -- cacheinfo option is required.
  • Finally, you need to specify the file mode, SHA-1 and file name.

File mode description:

  • 100644: indicates that this is an ordinary file. (the file mode of blob object is generally 100644)
  • 100755: represents an executable file.
  • 120000: represents a symbolic link.

To continue, let's look at the generated tree object:

# 4. After completing the above steps, check git directory
L@DESKTOP-T2AI2SU MINGW64 /j/git-repository/git_learning (master)
$ ll .git/
total 8
-rw-r--r-- 1 L 197121 130  4 November 14:50 config
-rw-r--r-- 1 L 197121  73  4 November 14:50 description
-rw-r--r-- 1 L 197121  23  4 November 14:50 HEAD
drwxr-xr-x 1 L 197121   0  4 November 14:50 hooks/
-rw-r--r-- 1 L 197121 104  4 November 15:39 index	# The index file appears
drwxr-xr-x 1 L 197121   0  4 November 14:50 info/
drwxr-xr-x 1 L 197121   0  4 November 14:59 objects/
drwxr-xr-x 1 L 197121   0  4 November 14:50 refs/
# Here's a hint. The staging area Stage can be understood as a simple index file.
# I mean git/index file. (key points)

# 5. Now check the contents of Git database or the previous blob object.
L@DESKTOP-T2AI2SU MINGW64 /j/git-repository/git_learning (master)
$ find .git/objects -type f
.git/objects/83/baae61804e65cc73a7201a7252750c76066a30

# 6. Submit the contents in the staging area to the local version library
# In other words, submit the file index (snapshot) stored in the temporary storage area to the local version library.
# Using the write tree command
# That is to generate tree objects through the write tree command
L@DESKTOP-T2AI2SU MINGW64 /j/git-repository/git_learning (master)
$ git write-tree
d8329fc1cc938780ffdd9f94e0d364e0ea74f579

# 7. Check the contents of Git database again. There is one more d8 object
L@DESKTOP-T2AI2SU MINGW64 /j/git-repository/git_learning (master)
$ find .git/objects -type f
.git/objects/83/baae61804e65cc73a7201a7252750c76066a30
.git/objects/d8/329fc1cc938780ffdd9f94e0d364e0ea74f579

# 8. Check the type of d8 object and you can see that it is a tree object
L@DESKTOP-T2AI2SU MINGW64 /j/git-repository/git_learning (master)
$ git cat-file -t d8329fc1cc938780ffdd9f94e0d364e0ea74f579
tree

# 9. Check the current status of the staging area again and find that the contents of the staging area are not empty
L@DESKTOP-T2AI2SU MINGW64 /j/git-repository/git_learning (master)
$ git ls-files -s
100644 83baae61804e65cc73a7201a7252750c76066a30 0       test.txt
# Note: when viewing the staging area, it is found that the content of the staging area is not empty, that is, the content of the staging area is written to the version library, and the staging area is not empty. (key points)

4) Summary

The above is the process of manually creating a tree object using the underlying commands in Git.

  • Create a file and store the file in the local version library through git hash object command.
  • Store the file in the temporary storage area through git update index command.
  • Through git write tree command, the file index information in the temporary storage area is submitted to the local version library to generate a tree object.

(3) Create a second file (focus)

1) New Txt file and modify test Txt file content

# 1. Create new Txt file
L@DESKTOP-T2AI2SU MINGW64 /j/git-repository/git_learning (master)
$ echo "new file" > new.txt

# 2. Modify test Txt file content
L@DESKTOP-T2AI2SU MINGW64 /j/git-repository/git_learning (master)
$ echo "version 2" >> test.txt

# 3. View the contents of the two files
L@DESKTOP-T2AI2SU MINGW64 /j/git-repository/git_learning (master)
$ cat new.txt
new file

L@DESKTOP-T2AI2SU MINGW64 /j/git-repository/git_learning (master)
$ cat test.txt
version 1
version 2

# 4. View the files in the working directory
L@DESKTOP-T2AI2SU MINGW64 /j/git-repository/git_learning (master)
$ ll
total 2
-rw-r--r-- 1 L 197121  9  4 November 16:25 new.txt
-rw-r--r-- 1 L 197121 20  4 November 16:25 test.txt

2) Add new Txt file and test The second version of the txt file is added to the staging area

Test Txt file is added to the staging area

# 1. View the current file information in the staging area
L@DESKTOP-T2AI2SU MINGW64 /j/git-repository/git_learning (master)
$ git ls-files -s
100644 83baae61804e65cc73a7201a7252750c76066a30 0       test.txt

# 2. Put test Txt file is submitted to the local version library
L@DESKTOP-T2AI2SU MINGW64 /j/git-repository/git_learning (master)
$ git hash-object -w ./test.txt
0c1e7391ca4e59584f8b773ecdbbb9467eba1547

# 3. Check the contents of Git database and you can see another 0c object
L@DESKTOP-T2AI2SU MINGW64 /j/git-repository/git_learning (master)
$ find .git/objects -type f
.git/objects/0c/1e7391ca4e59584f8b773ecdbbb9467eba1547
.git/objects/83/baae61804e65cc73a7201a7252750c76066a30
.git/objects/d8/329fc1cc938780ffdd9f94e0d364e0ea74f579
# Tip: because test. Is modified above Txt file content, when submitted to the version library, the content is different, and the hash changes.

# 4. Put the modified test Txt files are added to the staging area
# Because I submitted test before Txt file to the temporary storage area, so you don't need to add the -- add option
L@DESKTOP-T2AI2SU MINGW64 /j/git-repository/git_learning (master)
$ git update-index --cacheinfo 100644 \
> 0c1e7391ca4e59584f8b773ecdbbb9467eba1547 test.txt

# 5. View the current file information in the staging area
L@DESKTOP-T2AI2SU MINGW64 /j/git-repository/git_learning (master)
$ git ls-files -s
100644 0c1e7391ca4e59584f8b773ecdbbb9467eba1547 0       test.txt
# We can see that the test in the staging area Txt file is overwritten by the latest modified version,
# It started with 83
# Tip: the temporary storage area is overwritten by the corresponding file. The newly modified file overwrites the previous original file,
# Other files will not be overwritten, that is, the staging area is not overwritten as a whole. (key points)

Add new Txt file is added to the staging area

# 1.new.txt file is added to the staging area
# This time, we directly use a command to send new Txt files are added directly from the workspace to the staging area
# explain:
# Because it's new Txt file is added to the staging area for the first time, so -- add option is required
# Because it's from new Txt file is in the workspace, so -- cacheinfo option is not required
L@DESKTOP-T2AI2SU MINGW64 /j/git-repository/git_learning (master)
$ git update-index --add new.txt

# 2. Check the contents of Git database, and you can see that a fa object is right
L@DESKTOP-T2AI2SU MINGW64 /j/git-repository/git_learning (master)
$ find .git/objects -type f
.git/objects/0c/1e7391ca4e59584f8b773ecdbbb9467eba1547
.git/objects/83/baae61804e65cc73a7201a7252750c76066a30
.git/objects/d8/329fc1cc938780ffdd9f94e0d364e0ea74f579
.git/objects/fa/49b077972391ad58037050f2a75f74e3671e92

# 3. View the current file information in the staging area, new Txt file has been added to the staging area
L@DESKTOP-T2AI2SU MINGW64 /j/git-repository/git_learning (master)
$ git ls-files -s
100644 fa49b077972391ad58037050f2a75f74e3671e92 0       new.txt
100644 0c1e7391ca4e59584f8b773ecdbbb9467eba1547 0       test.txt

Note: git update index -- add file name completes the previous two steps.

  1. Put new Txt file contents are stored in Git version library.
  2. Put new Txt file is added to the staging area.

3) Submit the contents of the staging area to the local version library

At this time, the file status in the working directory and the temporary storage area is the same. You can submit it to the local version library through git write tree command to generate a tree image.

# 1. Submit contents of temporary storage area
L@DESKTOP-T2AI2SU MINGW64 /j/git-repository/git_learning (master)
$ git write-tree
163b45f0a0925b0655da232ea8a4188ccec615f5

# 2. Check the contents of Git database and you can see another tree object named 16
L@DESKTOP-T2AI2SU MINGW64 /j/git-repository/git_learning (master)
$ find .git/objects -type f
.git/objects/0c/1e7391ca4e59584f8b773ecdbbb9467eba1547
.git/objects/16/3b45f0a0925b0655da232ea8a4188ccec615f5
.git/objects/83/baae61804e65cc73a7201a7252750c76066a30
.git/objects/d8/329fc1cc938780ffdd9f94e0d364e0ea74f579
.git/objects/fa/49b077972391ad58037050f2a75f74e3671e92

L@DESKTOP-T2AI2SU MINGW64 /j/git-repository/git_learning (master)
$ git cat-file -t 163b45f0a0925b0655da232ea8a4188ccec615f5
tree

At this time, the five objects in the Git version library represent two versions of the project. (don't understand this sentence? Read on)

(4) Add the first tree object to the staging area to make it a new tree pair

# 1. View the current file information in the staging area
L@DESKTOP-T2AI2SU MINGW64 /j/git-repository/git_learning (master)
$ git ls-files -s
100644 fa49b077972391ad58037050f2a75f74e3671e92 0       new.txt
100644 0c1e7391ca4e59584f8b773ecdbbb9467eba1547 0       test.txt

# 2. Add the first tree object to the staging area
# First tree object hash:d8329fc1cc938780ffdd9f94e0d364e0ea74f579
L@DESKTOP-T2AI2SU MINGW64 /j/git-repository/git_learning (master)
$ git read-tree --prefix=bak d8329fc1cc938780ffdd9f94e0d364e0ea74f579

# 3. Check the current file information in the temporary storage area again. There is an additional bak / test Txt file
L@DESKTOP-T2AI2SU MINGW64 /j/git-repository/git_learning (master)
$ git ls-files -s
100644 83baae61804e65cc73a7201a7252750c76066a30 0       bak/test.txt
100644 fa49b077972391ad58037050f2a75f74e3671e92 0       new.txt
100644 0c1e7391ca4e59584f8b773ecdbbb9467eba1547 0       test.txt

explain:

  • Read tree command: you can read the tree object into the temporary storage area.
  • --prefix=bak option: read an existing tree object into the temporary storage area as a subtree.

Next, submit the contents of the staging area and continue to generate a new tree object in the Git warehouse.

# Generate the new content of the tree
L@DESKTOP-T2AI2SU MINGW64 /j/git-repository/git_learning (master)
$ git write-tree
01ab2a43b1eb150bcf00f375800727df240cf653

# View newly generated objects
# View the type of tree object
L@DESKTOP-T2AI2SU MINGW64 /j/git-repository/git_learning (master)
$ git cat-file -t 01ab2a43b1eb150bcf00f375800727df240cf653
tree

# View the contents of the tree object, that is, the contents of the record staging area.
# You can see that the tree object contains two blob objects and a tree object.
L@DESKTOP-T2AI2SU MINGW64 /j/git-repository/git_learning (master)
$ git cat-file -p 01ab2a43b1eb150bcf00f375800727df240cf653
040000 tree d8329fc1cc938780ffdd9f94e0d364e0ea74f579    bak
100644 blob fa49b077972391ad58037050f2a75f74e3671e92    new.txt
100644 blob 0c1e7391ca4e59584f8b773ecdbbb9467eba1547    test.txt


# View objects in the current Git warehouse
L@DESKTOP-T2AI2SU MINGW64 /j/git-repository/git_learning (master)
$ find .git/objects -type f
.git/objects/01/ab2a43b1eb150bcf00f375800727df240cf653
.git/objects/0c/1e7391ca4e59584f8b773ecdbbb9467eba1547
.git/objects/16/3b45f0a0925b0655da232ea8a4188ccec615f5
.git/objects/83/baae61804e65cc73a7201a7252750c76066a30
.git/objects/d8/329fc1cc938780ffdd9f94e0d364e0ea74f579
.git/objects/fa/49b077972391ad58037050f2a75f74e3671e92

Here we have finished our demonstration. Please see the summary below.

3. Summary

(1) Analyze the storage structure of each tree object

We can check the objects in Git local library first, as follows

.git/objects/01/ab2a43b1eb150bcf00f375800727df240cf653 # The third tree object
.git/objects/0c/1e7391ca4e59584f8b773ecdbbb9467eba1547 # test.txt second version (blob object)
.git/objects/16/3b45f0a0925b0655da232ea8a4188ccec615f5 # Second tree object
.git/objects/83/baae61804e65cc73a7201a7252750c76066a30 # test.txt first version (blob object)
.git/objects/d8/329fc1cc938780ffdd9f94e0d364e0ea74f579 # First tree object
.git/objects/fa/49b077972391ad58037050f2a75f74e3671e92 # new.txt first version (blob object)

Next, we use three diagrams to describe the structural relationship of the three tree objects.

The structure of the first tree object is as follows:


The structure of the second tree object is as follows:


The structure of the third tree object is as follows:


It can also be represented by Git object type

(2) blob object and tree object (emphasis)

From the above figure, we can analyze:

  • The blob object represents the version of the file from time to time.
  • The tree object represents successive versions of the project.

These are the five objects in the Git version library described in 2 - (3) above, which represent the two versions of the project.

(let's understand this first)

(3) Summary (key points)

Concept and understanding of staging area:

  1. The so-called staging area Stage is just a simple index file. I mean git/index file.
  2. The index file of the temporary storage area contains the directory tree of the file, which is like a virtual workspace. In the directory tree of the virtual workspace, the file name, file timestamp, file length, file type and the most important SHA-1 value are recorded. The content of the file is not stored in it, so it is like a virtual workspace.
    I.e. staging area, i.e The git/index file stores the index (snapshot) of the file content or the index of the tree object.
  3. The index points to Files (Git objects) in the git/objects / directory.
  4. Git creates a tree object based on the file index information of the staging area.
  5. The tree object can associate the file content with the file name.
  6. A tree object can contain one or more records (tree object and blob object).
  7. After the contents of the staging area are written to the version library, the index contents of the staging area are not cleared.
  8. The file content index in the temporary storage area is overwritten according to the corresponding file, that is, modifying a file content and adding it to the cache will only overwrite the corresponding file, and other files will not be overwritten, that is, the temporary storage area is not overwritten as a whole.

The role of the staging area: unless it is submitted directly by bypassing the staging area, Git must store the modifications in the staging area before commit ting. The file snapshot corresponding to the staging area is submitted each time.

Tip: for the hash key of Git object, we can intercept the first few digits. If the object is not so right during practice, we don't need to write all of them. It's OK to represent a unique object.

4. Question

Now there are three tree objects (because we executed the write tree command three times), representing three snapshots of the project we want to track. However, the problem remains: if you want to reuse these snapshots, you must remember the SHA-1 hash value of the three tree objects.

And you have no idea who saved the snapshots, when they were saved, or why.

In the above, the commit object saves these basic information for you.

5. Summary of commands used in this article

Git underlying command:

  • Save the snapshot to the index - add temporary storage area.
  • Git write tree: synchronize the index contents of the current staging area into a tree object.
  • Git LS files - s: view the file information in the staging area.
  • Git read tree -- prefix = bak: add an existing tree object to the staging area.
  • Git cat file - t key: check the type of GIT object.
  • Git cat file - P Key: view the content of GIT object.

reference resources:

Topics: git github