shell learning notes common commands tar cpio gzip zip

Posted by mooler on Mon, 29 Nov 2021 00:30:30 +0100

From Chapter 7 of Linux Shell script introduction

Archiving with tar

The tar command can archive files. It was originally designed to store data on tape, so its name also comes from Tape ARchive. Tar can package multiple files and folders into a single file, while retaining all file attributes, such as owner, permissions, etc. Files created by tar are often called tarball s. In this introduction, we will learn how to create an archive using tar

Archiving: combining multiple files into one file. The advantage of archiving is to reduce the number of files, reduce the number of files sent as e-mail attachments, and back up files

Compression: use the algorithm to process the file lossily or losslessly, so as to retain the most file information and reduce the file volume. The advantage of compression is to save hard disk space, reduce the size of e-mail attachments and improve transmission efficiency

Creating archive files with tar

$ tar -cf output.tar coco.sh
$ ls output.tar 
output.tar

$ tar -cf archive.tar file1 file2 file3 folder1 ..

Option - c means to create a new archive.
Option - f indicates the archive file name, which must be followed by a file name

Option - t lists the files contained in the archive

$ tar -tf output.tar 
coco.sh

Option - v or - vv parameter can add more details to the command output

This feature is called "verbose mode (v, verbose)" or "very verbose mode (vv, very verbose)".
For commands that can generate reports in the terminal, - v is a conventional option. This option can display more details, such as file permissions, groups to which the owner belongs, file modification date, etc

$ tar -tvf output.tar 
-rwxrwxr-x amlogic/amlogic 355 2021-11-25 15:43 coco.sh

The file name must appear immediately after - f, and - f should be the last option. If you want to use verbose mode, you should write it like this

$ tar -cvf output.tar file1 file2 file3 folder1 ..

Append file to archive

Option - r appends a new file to the end of an existing archive file:

$ tar -rvf output.tar file1.txt 
file1.txt
$ tar -tvf output.tar 
-rwxrwxr-x amlogic/amlogic 355 2021-11-25 15:43 coco.sh
-rw-rw-r-- amlogic/amlogic  10 2021-11-23 16:27 file1.txt

Extract a file or directory from an archive

Option - x extracts the contents of the archive file to the current directory

$ ls coco.sh
ls: cannot access 'coco.sh': No such file or directory
$ tar -xf output.tar
$ ls coco.sh 
coco.sh
# When - x is used, the tar command extracts the contents of the archive file to the current directory. We can also use option - C to specify which directory to extract files to
$ tar -xf archive.tar -C /path/to/extraction_directory

This command extracts the contents of the archive file to the specified directory. It extracts all the contents of the archive file. We can extract a specific file by using the file name as a command-line parameter

$ tar -xvf file.tar file1 file4

Using stdin and stdout in tar

$ tar cvf - files/ | ssh user@example.com "tar xv -C Documents/"

In the above example, the contents in the files directory are archived and output to stdout (indicated by -), and then extracted into the Documents directory in the remote system

Splice two archives

$ tar -tf original.tar 
file1.txt
$ tar -Af original.tar output.tar 
$ tar -tvf original.tar 
-rw-rw-r-- amlogic/amlogic  10 2021-11-23 16:27 file1.txt
-rwxrwxr-x amlogic/amlogic 355 2021-11-25 15:43 coco.sh
-rw-rw-r-- amlogic/amlogic  10 2021-11-23 16:27 file1.txt

Update the contents of the archive by checking the timestamp

The append option (- r) adds any specified file to the archive. If a file with the same name already exists, the archive file will contain two files with the same name. We can use the update option - u to indicate that only files newer than files with the same name in the archive file are added

$ tar -uf original.tar file1.txt 
$ tar -tvf original.tar 
-rw-rw-r-- amlogic/amlogic  10 2021-11-23 16:27 file1.txt
-rw-rw-r-- amlogic/amlogic  12 2021-11-27 17:40 file1.txt
-rw-rw-r-- amlogic/amlogic  12 2021-11-27 17:43 file1.txt

Compare the contents of the archive file with the file system

Option - d compares files in the archive to files in the file system. This function can be used to determine whether a new archive needs to be created

$ tar -df original.tar 
file1.txt: Mod time differs
file1.txt: Size differs
file1.txt: Mod time differs
file1.txt: Size differs
file1.txt: Mod time differs

Delete files from archive

$ tar -tvf original.tar 
-rw-rw-r-- amlogic/amlogic  10 2021-11-23 16:27 file1.txt
-rwxrwxr-x amlogic/amlogic 355 2021-11-25 15:43 coco.sh
-rw-rw-r-- amlogic/amlogic  10 2021-11-23 16:27 file1.txt
-rw-rw-r-- amlogic/amlogic  12 2021-11-27 17:40 file1.txt
-rw-rw-r-- amlogic/amlogic  12 2021-11-27 17:43 file1.txt
$ tar -f original.tar --delete file1.txt 
$ tar -tvf original.tar 
-rwxrwxr-x amlogic/amlogic 355 2021-11-25 15:43 coco.sh

Compress tar Archive

The tar command only archives files by default and does not compress them. However, tar supports options for compression. Compression can significantly reduce the size of files. Archive files are usually compressed into one of the following formats

gzip format: file.tar.gz or file.tgz.
bzip2 format: file.tar.bz2.
Lempel Ziv Markov format: file.tar.lzma.

Different tar options can be used to specify different compression formats

-j specify bunzip2 format;
-z specify gzip format;
– lzma specifies the lzma format.

To enable tar to support automatic selection of compression algorithms based on extensions, use the - a or – auto compress option

$ tar -acvf coco.tar.gz coco.sh 
coco.sh
$ tar -tvf coco.tar.gz 
-rwxrwxr-x amlogic/amlogic 355 2021-11-25 15:43 coco.sh

Exclude some files from archiving

$ tar -cf arch.tar * --exclude "*.txt"
# You can also put the list of files to be excluded into the file with option - X
$ cat list 
filea 
fileb
$ tar -cf arch.tar * -X list

Note that patterns should be referenced in double quotes to avoid shell extensions

Exclude version control directories

tar option – exclude VCs can exclude files and directories related to version control during archiving

$ tar --exclude-vcs -czvvf source_code.tar.gz eye_of_gnome_svn

Total bytes printed

The - totals option prints out the total number of bytes archived

$ tar -cvf coco.tar coco.sh --totals
coco.sh
Total bytes written: 10240 (10KiB, 11MiB/s)

Archiving with cpio

cpio is similar to tar. It can archive multiple files and directories while retaining all file attributes, such as permissions, file ownership, etc. cpio format is used for RPM package (Fedora uses this format), initramfs file of Linux kernel (including kernel image), etc. This guide will give several uses of cpio

# Create test file
$ touch file1 file2 file3

# Archive test files
$ ls file* | cpio -ov > archive.cpio
file1
file2
file3
1 block

# Lists the contents of the cpio archive
$ cpio -it < archive.cpio
file1
file2
file3
1 block

# Extract files from cpio Archive
$ rm file1
$ rm file2
$ rm file3
$ cpio -id < archive.cpio
1 block
$ ls file*
file1  file2  file3

For the archive command cpio

-o output specified
-v used to print the list of archived documents
-i is used to specify the input
-t is used to list the contents of the archive

Compress data using gzip

gzip and gunzip can be used for compression and decompression respectively
gzip does not retain the original file by default

Compress files using gzip

$ gzip test
$ ls test*
test_copy1  testfile  test.gz

Unzip the gzip file

$ gunzip test.gz
$ ls test
test

Lists the attribute information of the compressed file

$ gzip -l test.gz 
 compressed        uncompressed  ratio uncompressed_name
       31                   6     -33.3% test

The gzip command can read in the file from stdin and write out the compressed file to stdout

# Read in from stdin and write out the compressed data to stdout
$ cat test |gzip -c > test.gz
# Option - c is used to specify the output to stdout. This option can also be used with cpio
$ ls * | cpio -o | gzip -c > cpiooutput.gz 
$ zcat cpiooutput.gz | cpio -it

We can specify the compression level of gzip. – The fast or – best option provides the lowest or highest compression ratio, respectively

$ gzip --fast test

Compressed archive

The first method

$ tar -czvvf archive.tar.gz [FILES]
# perhaps
$ tar -cavvf archive.tar.gz [FILES]
# Option - z indicates that gzip is used for compression, and option - a indicates that the compression format is inferred from the file extension

The second method

# First, create a tar archive
$ tar -cvvf archive.tar [FILES]
# Compress tar Archive
$ gzip archive.tar

The following command can extract the contents of the archive file compressed by gzip

$ tar -xavvf archive.tar.gz -C extract_directory

zcat -- directly read gzip format files

The zcat command can output the contents of the. gz file to stdout without decompression gz files do not change

$ ls test.gz 
test.gz
$ zcat test.gz 
coco
$ ls test.gz 
test.gz

compression ratio

We can specify the compression rate, which has 9 levels, of which

Level 1 has the lowest compression rate, but the fastest compression speed
Level 9 has the highest compression rate, but the slowest compression speed

You can specify the compression ratio as follows

$ gzip -5 test.img

Using bzip2

Bzip2 is similar to gzip in function and syntax. The difference is that bzip2 is more efficient than gzip, but takes longer than gzip

Compress with bzip2

$ bzip2 filename

Unzip the file in bzip2 format

$ bunzip2 filename.bz2

The method of generating tar.bz2 file and extracting content from it is similar to tar.gz described earlier

$ tar -xjvf archive.tar.bz2

Where, - j indicates that the archive file is compressed in bzip2 format

Using lzma

The compression ratio of lzma is better than gzip and bzip2

Compression using lzma

$ lzma filename

Unzip lzma file

$ unlzma filename.lzma

The generated tar archive can be compressed using the – lzma option

$ tar -cvvf --lzma archive.tar.lzma [FILES]
# perhaps
$ tar -cavvf archive.tar.lzma [FILES]

Extract the contents of the lzma compressed tar archive into the specified directory

$ tar -xvvf --lzma archive.tar.lzma -C extract_directory
# Where, - x is used to extract the content, and - lzma specifies to use lzma to extract the archive file
# We can also use
$ tar -xavvf archive.tar.lzma -C extract_directory

Archive and compress using zip

Create a zip archive

$ zip coco.zip file1 file2 file3 
  adding: file1 (stored 0%)
  adding: file2 (stored 0%)
  adding: file3 (stored 0%)
 
$ ls coco*
coco.zip

$ zip file.zip file

Option - r enables recursive archiving of directories

$ zip -r picture.zip picture/
  adding: picture/ (stored 0%)
  adding: picture/bmp/ (stored 0%)
  adding: picture/bmp/Untitled.bmp (deflated 21%)
  adding: picture/bmp/1.bmp (deflated 45%)
  adding: picture/bmp/Snack.bmp (deflated 46%)
  adding: picture/bmp/glass.bmp (deflated 55%)
  adding: picture/bmp/Untitled.bmp (deflated 60%)
  adding: picture/bmp/4.bmp (deflated 48%)
  ...

The unzip command extracts content from a ZIP file

$ unzip file.zip

unzip does not delete the file.zip after the extraction operation (unlike unlzma and gunzip)

Option - u updates the contents of the compressed archive

$ zip file.zip -u newfile

Option - d Deletes one or more files from the compressed archive

$ zip -d arc.zip file.txt

Option - l lists the contents of the compressed archive

$ unzip -l picture.zip 
Archive:  picture.zip
  Length      Date    Time    Name
---------  ---------- -----   ----
        0  2021-11-11 16:57   picture/
        0  2021-11-03 13:35   picture/bmp/
   674094  2021-11-03 13:35   picture/bmp/Untitled.bmp
   777838  2021-11-03 13:35   picture/bmp/1.bmp
  1081554  2021-11-03 13:35   picture/bmp/Snack.bmp
   682330  2021-11-03 13:35   picture/bmp/glass.bmp
   750054  2021-11-03 13:35   picture/bmp/Untitled.bmp
   921654  2021-11-03 13:35   picture/bmp/4.bmp
        0  2021-11-03 13:35   picture/pic/
   325520  2021-11-03 13:35   picture/pic/qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqaaaaaaaaaaaaaaaaaaaaaaaaaZzzzzzzzzzzzzzzzzzzzzz.png
        0  2021-11-03 13:35   picture/gif/
   150785  2021-11-03 13:35   picture/gif/Unnamed.gif
    12450  2021-11-03 13:35   picture/gif/393020.gif
    ...

Faster archiving tool pbzip2

Most of the compression commands we've seen so far can only use a single processor core. The pbzip2, plzip, pigz and lrzip commands all use multithreading, which can reduce the time required to compress files with the help of multiple cores

These tools are not installed in most distributions. You can install them yourself using apt get or yum

sudo apt-get install pbzip2

Compress a single file

$ pbzip2 picture.zip

###To compress and archive multiple files or directories, you can use tar with pbzip2

$ tar cf coco.tar.bz2 --use-compress-program=pbzip2 for_fun/ picture/
$ ls -l coco.tar.bz2 
-rw-rw-r-- 1 amlogic amlogic 59820017 11 July 28-17:16 coco.tar.bz2

# perhaps
$ tar -c for_fun/ picture/ |pbzip2 -c > coco.tat.bz2

Extract from a file in pbzip2 format.

$ pbzip2 -d file1.bz2
# If it is a tar.bz2 file, we can use the pipeline to decompress and extract it
$ pbzip2 -dc myfile.tar.bz2 | tar -x

Manually specify the number of processors

$ pbzip2 -p4 myfile.tar

The above command tells pbzip2 to use four processor cores

Specify compression ratio

From options - 1 to - 9, you can specify the fastest to best compression effect, where - 1 has the fastest compression speed and - 9 has the highest compression rate

Create a compressed file system

Squashfs program can create a read-only file system with ultra-high compression rate. It can compress 2GB~3GB data into a 700MB file. Linux LiveCD (or LiveUSB) was created using squashfs. This type of CD uses a read-only compressed file system to save the root file system in a compressed file. It can be mounted and loaded into a full linux environment using loopback. If you need some files, you can unzip them and load them into memory for use

All modern Linux distributions support mounting squashfs file systems. However, to create a squashfs file, you need to install squashfs tools using the package manager

$ sudo apt-get install squashfs-tools

Use the mksquashfs command to add source directories and files to create a squashfs file

$ mksquashfs SOURCES compressedfs.squashfs
# SOURCES can be wildcards, file or directory paths.

$ sudo mksquashfs /etc test.squashfs 
Parallel mksquashfs: Using 2 processors 
Creating 4.0 filesystem on test.squashfs, block size 131072.

[=======================================] 1867/1867 100%

Mount squashfs files in loopback form

# mkdir /mnt/squash 
# mount -o loop compressedfs.squashfs /mnt/squash

Exclude some files when creating squashfs files

# Option - e can exclude some files and directories
$ sudo mksquashfs /etc test.squashfs -e /etc/passwd /etc/shadow
# Where the option - e is used to exclude the files / etc/passwd and / etc/shadow

# You can also write a list of file names to be excluded to a file and specify the file with the option - ef
$ cat excludelist 
/etc/passwd 
/etc/shadow

$ sudo mksquashfs /etc test.squashfs -ef excludelist
# If you want to use wildcards in the excluded files list, you need to use the - wildcard option

Backup system snapshots using rsync

Data backup needs to be completed regularly. In addition to backing up local files, remote data may also be involved. rsync can synchronize files and directories in different locations while minimizing the amount of data transfer. Compared with the cp command, rsync has the advantage of comparing file modification dates and copying only newer files. In addition, it supports remote data transmission, compression and encryption

Copy source directory to destination path

$ rsync -av source_path destination_path
# for example
$ rsync -av /home/slynux/data slynux@192.168.0.6:/home/backups/data

-a indicates archiving
-v (verbose) means printing out details or progress on stdout
The above command will recursively copy all files from the source path to the destination path. The source and destination paths can be either remote or local

Back up data to a remote server or host

$ rsync -av source_dir username@host:PATH

The following command can restore the data on the remote host to the local

$ rsync -av username@host:PATH destination

When transmitting through the network, compressed data can significantly improve the transmission efficiency. We can use rsync's option - z to specify that data is compressed during transmission

$ rsync -avz source destination

Synchronize the contents of one directory to another

$ rsync -av /home/test/ /home/backups

This command copies the contents of the source directory (/ home/test) (excluding the directory itself) to the existing backups directory

Copy content, including the directory itself, to another directory

$ rsync -av /home/test /home/backups

Exclude some files when archiving with rsync

The options – exclude and – exclude from specify files that do not need to be transferred

–exclude PATTERN

You can use wildcards to specify files to exclude

$ rsync -avz /home/code/app /mnt/disk/backup/code --exclude "*.o"

Or we can specify the files to be excluded through a list file.
This requires the use of – exclude from filepath

When updating rsync backups, delete nonexistent files

By default, rsync does not delete files on the destination side that no longer exist on the source side. If you want to delete such files, you can use rsync's – delete option

$ rsync -avz SOURCE DESTINATION --delete

Regular backup

You can create a cron task to back up regularly

$ crontab -ev
# Add the following line:
0 */10 * * * rsync -avz /home/code user@IP_ADDRESS:/home/backups

The crontab item above schedules rsync to run every 10 hours
*/10 is in the hour position in crontab syntax, / 10 indicates that a backup is performed every 10 hours.
If * / 10 appears in the minutes position, the backup is performed every 10 minutes

Differentiated archiving

So far, the backup methods we have described are to completely copy the file system at that time. This method is useful if you can find the problem immediately and then use the latest snapshot to recover. However, if you do not find the problem in time until you make a new snapshot, and the previous correct data has been overwritten by the current wrong data, this method will not be useful.

The file system archive provides a history of document changes. If you need to return an earlier version of a damaged file, you can use it.

rsync, tar, and cpio can be used to take daily snapshots of file systems. But it's too expensive. Create an independent snapshot every day, and the storage space required in a week is 7 times that of the backup file system.

Differentiated backups only need to save files that have changed since the last full backup. The dump/restore tool in Unix supports this form of archive backup. Unfortunately, these tools are designed for tape devices, so they are not easy to use.

The find command can achieve the same function with tar or cpio

Use tar to create the first full backup

$ tar -cvz /backup/full.tgz /home/user
# Use the - newer option of the find command to determine which files have changed since the last full backup, and then create a new archive
$ tar -czf day-`date +%j`.tgz `find /home/user –newer /backup/full.tgz`

The find command generates a list of all files that have changed since the last full backup (/ backup/full.tgz) was created.

The date command generates a file name based on the Julian date. Therefore, the first differentiated backup of that year is day-1.tgz, the backup of January 2 is day-2.tgz, and so on.

Since more and more files will be changed after the first full backup, the daily differential archiving will become larger and larger. When the size of the return file exceeds the expectation, a new full backup needs to be made.

Create a full disk image using fsarchiver

Fsarchiver can save the contents of the entire disk partition into a compressed archive file. Unlike tar or cpio, fsarchiver can retain the extended attributes of files and can be used to restore the current file system to disk. It can recognize and retain the file attributes of Windows and Linux systems, so it is suitable for migrating Samba mounted partitions

Create file system / partition backup

$ fsarchiver savefs backup.fsa /dev/sda1
# backup.fsa is the final backup file and / dev/sda1 is the partition to be backed up

Backup multiple partitions at the same time

# Or use the savefs option to take multiple partitions as the last parameter of the fsarchiver
$ fsarchiver savefs backup.fsa /dev/sda1 /dev/sda2

Restore partition from backup archive

# Use the restfs option of fsarchiver
$ fsarchiver restfs backup.fsa id=0,dest=/dev/sda1
# id=0 indicates that we want to extract the contents of the first partition from the backup archive and restore it to the partition specified by dest=/ dev/sda1
# Restore multiple partitions from backup archive
# As before, use the restfs option:

# fsarchiver restfs backup.fsa id=0,dest=/dev/sda1 id=1,dest=/dev/sdb1

Topics: shell

Programmer Think