[learn and forget] git principle - 15. Git object [blob object]

Posted by stevenm on Fri, 18 Feb 2022 13:59:15 +0100

Git is a content addressed file system. What do you mean?

The core of Git is a simple key value data store. You can insert any type of content into the database and return a key value, which can be retrieved at any time.

(1) Directory where Git objects are stored

All objects in Git are stored in the local version library Objects in the / git database.

First of all, make a quick Git warehouse:

# Create a local git repository_ learning
L@DESKTOP-T2AI2SU MINGW64 /j/git-repository
$ mkdir git_learning

L@DESKTOP-T2AI2SU MINGW64 /j/git-repository
$ cd git_learning/

L@DESKTOP-T2AI2SU MINGW64 /j/git-repository/git_learning
$ git init
Initialized empty Git repository in J:/git-repository/git_learning/.git/

L@DESKTOP-T2AI2SU MINGW64 /j/git-repository/git_learning (master)
$ ll -a
total 8
drwxr-xr-x 1 L 197121 0  4 October 20:24 ./
drwxr-xr-x 1 L 197121 0  4 October 20:23 ../
drwxr-xr-x 1 L 197121 0  4 October 20:24 .git/

Confirm that the objects directory is in the default initial state:

# see. git/objects / directory
L@DESKTOP-T2AI2SU MINGW64 /j/git-repository/git_learning (master)
$ ll -a .git/objects/
total 4
drwxr-xr-x 1 L 197121 0  4 October 20:24 ./
drwxr-xr-x 1 L 197121 0  4 October 20:24 ../
drwxr-xr-x 1 L 197121 0  4 October 20:24 info/
drwxr-xr-x 1 L 197121 0  4 October 20:24 pack/

# View the contents in the info directory and the pack directory
L@DESKTOP-T2AI2SU MINGW64 /j/git-repository/git_learning (master)
$ ll -a .git/objects/info/
total 0
drwxr-xr-x 1 L 197121 0  4 October 20:24 ./
drwxr-xr-x 1 L 197121 0  4 October 20:24 ../

L@DESKTOP-T2AI2SU MINGW64 /j/git-repository/git_learning (master)
$ ll -a .git/objects/pack/
total 0
drwxr-xr-x 1 L 197121 0  4 October 20:24 ./
drwxr-xr-x 1 L 197121 0  4 October 20:24 ../

It can be seen from the above that when Git initializes a local version library, it has already initialized the objects directory and created the package and info subdirectories in it, but there are no other regular files, and there are no files in the package and info subdirectories. We only focus on the changes in the objects directory except the info and pack directories.

(2) Object types in Git

There are four object types in Git: blob object, tree object, commit object and tag object. These four atomic objects form the basis of GIT's high-level data structure.

(3) blob object

1. blob object description

(1) blob object definition

blob objects are called data objects.

Blob objects are used to store text content. That is, the content of a text file is stored in the Git system as a blob object.

Translation:

  • The blob object in Git is the file in the corresponding file system, specifically the content of the file, including
    Key: a combination of hash value and check value,
    Value: the content of the file.
  • The special thing is that the blob object only saves the content, not the file name, and the file name is saved in the tree object.

blob objects are stored as follows:

(2) blob object description

It is demonstrated by the underlying command git hash object, which can save any data in git/objects directory (i.e. object database), and returns a unique key to the data object.

1) Create a new data object and manually store it in your new Git database:

L@DESKTOP-T2AI2SU MINGW64 /j/git-repository/git_learning (master)
$ echo 'git object test content' | git hash-object -w --stdin
cb2eb834126f53952590c448f14fece6cbb1bff3

Note: This is the simplest way to store data into the Git version Library in Git. The Git hash object command will accept what you pass to it, and it will only return the unique key of the data object that can be stored in the Git warehouse.

The meaning of the command is as follows:

  • Git hash object: git underlying command, which can return key values representing these contents according to the incoming text contents.
  • git object test content: refers to the content in the text file.
  • -w option: indicates that the Hash object command stores the data object in the Git database; If this option is not specified, the command returns only the corresponding key (that is, the string of Hash values)
  • --stdin option: indicates that the command reads content from standard input (such as keyboard). If this option is not specified, the path of the file to be stored must be given at the end of the command. For example: git hash object - W file path.

This command outputs a checksum with a length of 40 characters, which is a SHA-1 hash value, as shown in cb2eb834126f53952590c448f14fece6cb1bff3 above. This value is the value obtained by hashing the original content of the file plus the specific header information. (as long as the text content is the same, the calculated results are the same)

2) View the local version library on your computer Changes in git/objects directory

You can see There is an additional cb folder in git/objects directory, as follows:


After entering the cb folder, you can see a file as follows:


OK, let's just see here. It's more complicated. Let's go back to Git Bash.

3) Look at the local version library Changes in git/objects directory

# You can add the - type f option parameter only by looking at the file
L@DESKTOP-T2AI2SU MINGW64 /j/git-repository/git_learning (master)
$ find .git/objects
.git/objects
.git/objects/cb
.git/objects/cb/2eb834126f53952590c448f14fece6cbb1bff3
.git/objects/info
.git/objects/pack

We can see the 2eb834126f53952590c448f14fece6cb1bff3 files in the CB directory and the CB directory.

This is the storage of a blob object in Git.

(3) How blob objects are stored

The addressing of Git objects is represented by a 40 bit hexadecimal number, that is, SHA-1 hash code, such as cb2eb834126f53952590c448f14fece6cbb1bff3.

For the convenience of management, the first two in the file system are used as The name of the subdirectory git/objects / followed by 38 as the file name.

As follows:

# View the files in the objects directory. A file corresponds to a Git object (data content)
L@DESKTOP-T2AI2SU MINGW64 /j/git-repository/git_learning (master)
$ find .git/objects -type f
.git/objects/cb/2eb834126f53952590c448f14fece6cbb1bff3

# Output the hash key of the data object without saving the data object.
L@DESKTOP-T2AI2SU MINGW64 /j/git-repository/git_learning (master)
$ echo 'git object test content' | git hash-object --stdin
cb2eb834126f53952590c448f14fece6cbb1bff3

We can see that the two Hash strings are the same.

Tip: you may feel that using 40 bits as the addressing ID of Git object may have different contents but the hash code is the same. Your feeling is correct, but the probability of this situation is certainly negligible.

(4) View blob object content

Let's read the above file directly with cat command to see what happens, as shown in the following figure:


You can see that the displayed content is a mess.

We need to read the data according to the Hash key and use the command git cat file - p key.

-The p option instructs the command to automatically determine the type of content and display the content in a friendly format for us.

As follows:

L@DESKTOP-T2AI2SU MINGW64 /j/git-repository/git_learning (master)
$ git cat-file -p cb2eb834126f53952590c448f14fece6cbb1bff3
git object test content

Tips:

Read Git object file directly with cat command. Why is it garbled information?

The file content is first compressed by zlib, and then the zlib compressed content is written to the disk file (the first two characters of SHA-1 are used as the name of the subdirectory, and the last 38 characters are used as the name of the subdirectory file)

(5) View the type of Git object

Through git cat file - t key command, you can view Type of GIT object in git/objects directory

L@DESKTOP-T2AI2SU MINGW64 /j/git-repository/git_learning (master)
$ git cat-file -t cb2eb834126f53952590c448f14fece6cbb1bff3
blob

This also shows that the Git object we previously stored is a blob object.

(6) Git management file

So far, you have mastered how to store content into Git and how to remove it.

We can also apply these operations to the contents of the file. For example, you can perform simple version control on a file.

1) First, create a file

L@DESKTOP-T2AI2SU MINGW64 /j/git-repository/git_learning (master)
$ echo "hello-git.txt v1" > hello-git.txt

L@DESKTOP-T2AI2SU MINGW64 /j/git-repository/git_learning (master)
$ ll
total 1
-rw-r--r-- 1 L 197121 17  4 October 23:17 hello-git.txt

L@DESKTOP-T2AI2SU MINGW64 /j/git-repository/git_learning (master)
$ cat hello-git.txt
hello-git.txt v1

At this time, the file is also managed by Git.

2) Put Hello Git Txt file into Git database

# Add the version library and generate blob objects
L@DESKTOP-T2AI2SU MINGW64 /j/git-repository/git_learning (master)
$ git hash-object -w ./hello-git.txt
a620c95d3001e1f64cecfc6715f9750cc7bbbf98

3) View Git database content

L@DESKTOP-T2AI2SU MINGW64 /j/git-repository/git_learning (master)
$ find .git/objects -type f
.git/objects/a6/20c95d3001e1f64cecfc6715f9750cc7bbbf98
.git/objects/cb/2eb834126f53952590c448f14fece6cbb1bff3

You can see that there is an a6 subdirectory, which indicates that an object has been added.

4) View a6 object contents

L@DESKTOP-T2AI2SU MINGW64 /j/git-repository/git_learning (master)
$ git cat-file -p a620c95d3001e1f64cecfc6715f9750cc7bbbf98
hello-git.txt v1

5) View a6 object types

L@DESKTOP-T2AI2SU MINGW64 /j/git-repository/git_learning (master)
$ git cat-file -t a620c95d3001e1f64cecfc6715f9750cc7bbbf98
blob

You can see that whether you store a file or the console content into Git, the Git object of blob type is finally stored in the Git database. (that is, blob objects are used to store data content)

(7) Git manages modified files

1) Let's continue to Hello GIT Txt file

L@DESKTOP-T2AI2SU MINGW64 /j/git-repository/git_learning (master)
$ echo "hello-git.txt v2" >> hello-git.txt

L@DESKTOP-T2AI2SU MINGW64 /j/git-repository/git_learning (master)
$ cat hello-git.txt
hello-git.txt v1
hello-git.txt v2

2) View objects in Git database

L@DESKTOP-T2AI2SU MINGW64 /j/git-repository/git_learning (master)
$ find .git/objects -type f
.git/objects/a6/20c95d3001e1f64cecfc6715f9750cc7bbbf98
.git/objects/cb/2eb834126f53952590c448f14fece6cbb1bff3

You can see the two objects cb and a6 between them, indicating that the modified files will not be automatically stored in the Git database.

We also need to manually change the modified Hello GIT Txt file, stored in Git database.

3) Put the modified Hello Git Txt file to Git database

L@DESKTOP-T2AI2SU MINGW64 /j/git-repository/git_learning (master)
$ git hash-object -w ./hello-git.txt
7c320a2d671f2ff177063f98343a0123432521dd

4) Look at the objects in the Git database again

L@DESKTOP-T2AI2SU MINGW64 /j/git-repository/git_learning (master)
$ find .git/objects -type f
.git/objects/7c/320a2d671f2ff177063f98343a0123432521dd
.git/objects/a6/20c95d3001e1f64cecfc6715f9750cc7bbbf98
.git/objects/cb/2eb834126f53952590c448f14fece6cbb1bff3

We can see that there is one more 7c object in Git database.

5) View 7c object stored content

L@DESKTOP-T2AI2SU MINGW64 /j/git-repository/git_learning (master)
$ git cat-file -p 7c320a2d671f2ff177063f98343a0123432521dd
hello-git.txt v1
hello-git.txt v2

As shown above, we can see that the contents of v1 and v2 are stored in the 7c object. The contents of v1 are both in the a6 object and the 7c object. Therefore, for Git, the increment of file content is not stored.

2. blob object summary

  • The core of Git is a simple key value data store. The key is the hash of the text content, and the value is the text content.
  • blob objects are stored in In git/objects directory, the file name in subdirectory + directory is the 40 bit Hash value, that is, the key value of the object.
  • You can find the corresponding content through this key.
  • When each text content is stored in Git database, the content will be zlib compressed and then stored.
  • Blob objects store the contents of files. The same contents do not produce new blob objects.
  • blob objects do not store file names.

Tip: for the hash key of Git object, we can intercept the first few digits. If the object is not so right during practice, we don't need to write all of them. It's OK to represent a unique object.

3. Question

Once we modify the file and store it in the Git database, a new blob object will be created in the Git database. In the actual work, we need to make many changes before submitting a version. Can we use a blob object to represent a snapshot of the whole project.

No, it can only represent that when the content in a file is stored at one time, the Git object will not be added when the data content is the same as the previous data content, and the blob object will be added again when the data content is different. That is: as long as new content is managed by Git, there must be a blob object corresponding to it.

Then there are the following questions:

  1. It is unrealistic to remember the SHA-1 value corresponding to each version of the file.
  2. In the blob object, the file name is not saved, only the contents of the file are saved.

Therefore, without a file name, there is no way to read data through the file name. It can only be read with a 40 bit Hash value, which is very unrealistic.

Solution: tree object.

Tip: the above operations are performed between the workspace and the local version library, and do not involve the temporary storage area, because we directly store them in the local version library.

4. Summary of commands used in this article

Git underlying command:

  • Git hash object - W file path: submit a file in the workspace to the local version library.
  • find .git/objects -type f: view objects in Git database. (Linux command)
  • Git cat file - P Key: view the contents of the GIT object.
  • Git cat file - t key: check the type of the GIT object.

reference resources:

Topics: git github