From Chapter 7 of Linux Shell script introduction
Archiving with tar
The tar command can archive files. It was originally designed to store data on tape, so its name also comes from Tape ARchive. Tar can package multiple files and folders into a single file, while retaining all file attributes, such as owner, permissions, etc. Files created by tar are often called tarball s. In this introduction, we will learn how to create an archive using tar
Archiving: combining multiple files into one file. The advantage of archiving is to reduce the number of files, reduce the number of files sent as e-mail attachments, and back up files
Compression: use the algorithm to process the file lossily or losslessly, so as to retain the most file information and reduce the file volume. The advantage of compression is to save hard disk space, reduce the size of e-mail attachments and improve transmission efficiency
Creating archive files with tar
$ tar -cf output.tar coco.sh $ ls output.tar output.tar $ tar -cf archive.tar file1 file2 file3 folder1 ..
Option - c means to create a new archive.
Option - f indicates the archive file name, which must be followed by a file name
Option - t lists the files contained in the archive
$ tar -tf output.tar coco.sh
Option - v or - vv parameter can add more details to the command output
This feature is called "verbose mode (v, verbose)" or "very verbose mode (vv, very verbose)".
For commands that can generate reports in the terminal, - v is a conventional option. This option can display more details, such as file permissions, groups to which the owner belongs, file modification date, etc
$ tar -tvf output.tar -rwxrwxr-x amlogic/amlogic 355 2021-11-25 15:43 coco.sh
The file name must appear immediately after - f, and - f should be the last option. If you want to use verbose mode, you should write it like this
$ tar -cvf output.tar file1 file2 file3 folder1 ..
Append file to archive
- Option - r appends a new file to the end of an existing archive file:
$ tar -rvf output.tar file1.txt file1.txt $ tar -tvf output.tar -rwxrwxr-x amlogic/amlogic 355 2021-11-25 15:43 coco.sh -rw-rw-r-- amlogic/amlogic 10 2021-11-23 16:27 file1.txt
Extract a file or directory from an archive
- Option - x extracts the contents of the archive file to the current directory
$ ls coco.sh ls: cannot access 'coco.sh': No such file or directory $ tar -xf output.tar $ ls coco.sh coco.sh # When - x is used, the tar command extracts the contents of the archive file to the current directory. We can also use option - C to specify which directory to extract files to $ tar -xf archive.tar -C /path/to/extraction_directory
This command extracts the contents of the archive file to the specified directory. It extracts all the contents of the archive file. We can extract a specific file by using the file name as a command-line parameter
$ tar -xvf file.tar file1 file4
Using stdin and stdout in tar
$ tar cvf - files/ | ssh user@example.com "tar xv -C Documents/"
In the above example, the contents in the files directory are archived and output to stdout (indicated by -), and then extracted into the Documents directory in the remote system
Splice two archives
$ tar -tf original.tar file1.txt $ tar -Af original.tar output.tar $ tar -tvf original.tar -rw-rw-r-- amlogic/amlogic 10 2021-11-23 16:27 file1.txt -rwxrwxr-x amlogic/amlogic 355 2021-11-25 15:43 coco.sh -rw-rw-r-- amlogic/amlogic 10 2021-11-23 16:27 file1.txt
Update the contents of the archive by checking the timestamp
The append option (- r) adds any specified file to the archive. If a file with the same name already exists, the archive file will contain two files with the same name. We can use the update option - u to indicate that only files newer than files with the same name in the archive file are added
$ tar -uf original.tar file1.txt $ tar -tvf original.tar -rw-rw-r-- amlogic/amlogic 10 2021-11-23 16:27 file1.txt -rw-rw-r-- amlogic/amlogic 12 2021-11-27 17:40 file1.txt -rw-rw-r-- amlogic/amlogic 12 2021-11-27 17:43 file1.txt
Compare the contents of the archive file with the file system
Option - d compares files in the archive to files in the file system. This function can be used to determine whether a new archive needs to be created
$ tar -df original.tar file1.txt: Mod time differs file1.txt: Size differs file1.txt: Mod time differs file1.txt: Size differs file1.txt: Mod time differs
Delete files from archive
$ tar -tvf original.tar -rw-rw-r-- amlogic/amlogic 10 2021-11-23 16:27 file1.txt -rwxrwxr-x amlogic/amlogic 355 2021-11-25 15:43 coco.sh -rw-rw-r-- amlogic/amlogic 10 2021-11-23 16:27 file1.txt -rw-rw-r-- amlogic/amlogic 12 2021-11-27 17:40 file1.txt -rw-rw-r-- amlogic/amlogic 12 2021-11-27 17:43 file1.txt $ tar -f original.tar --delete file1.txt $ tar -tvf original.tar -rwxrwxr-x amlogic/amlogic 355 2021-11-25 15:43 coco.sh
Compress tar Archive
The tar command only archives files by default and does not compress them. However, tar supports options for compression. Compression can significantly reduce the size of files. Archive files are usually compressed into one of the following formats
- gzip format: file.tar.gz or file.tgz.
- bzip2 format: file.tar.bz2.
- Lempel Ziv Markov format: file.tar.lzma.
Different tar options can be used to specify different compression formats
- -j specify bunzip2 format;
- -z specify gzip format;
- – lzma specifies the lzma format.
To enable tar to support automatic selection of compression algorithms based on extensions, use the - a or – auto compress option
$ tar -acvf coco.tar.gz coco.sh coco.sh $ tar -tvf coco.tar.gz -rwxrwxr-x amlogic/amlogic 355 2021-11-25 15:43 coco.sh
Exclude some files from archiving
$ tar -cf arch.tar * --exclude "*.txt" # You can also put the list of files to be excluded into the file with option - X $ cat list filea fileb $ tar -cf arch.tar * -X list
Note that patterns should be referenced in double quotes to avoid shell extensions
Exclude version control directories
tar option – exclude VCs can exclude files and directories related to version control during archiving
$ tar --exclude-vcs -czvvf source_code.tar.gz eye_of_gnome_svn
Total bytes printed
The - totals option prints out the total number of bytes archived
$ tar -cvf coco.tar coco.sh --totals coco.sh Total bytes written: 10240 (10KiB, 11MiB/s)
Archiving with cpio
cpio is similar to tar. It can archive multiple files and directories while retaining all file attributes, such as permissions, file ownership, etc. cpio format is used for RPM package (Fedora uses this format), initramfs file of Linux kernel (including kernel image), etc. This guide will give several uses of cpio
# Create test file $ touch file1 file2 file3 # Archive test files $ ls file* | cpio -ov > archive.cpio file1 file2 file3 1 block # Lists the contents of the cpio archive $ cpio -it < archive.cpio file1 file2 file3 1 block # Extract files from cpio Archive $ rm file1 $ rm file2 $ rm file3 $ cpio -id < archive.cpio 1 block $ ls file* file1 file2 file3
For the archive command cpio
- -o output specified
- -v used to print the list of archived documents
- -i is used to specify the input
- -t is used to list the contents of the archive
Compress data using gzip
gzip and gunzip can be used for compression and decompression respectively
gzip does not retain the original file by default
Compress files using gzip
$ gzip test $ ls test* test_copy1 testfile test.gz
Unzip the gzip file
$ gunzip test.gz $ ls test test
Lists the attribute information of the compressed file
$ gzip -l test.gz compressed uncompressed ratio uncompressed_name 31 6 -33.3% test
The gzip command can read in the file from stdin and write out the compressed file to stdout
# Read in from stdin and write out the compressed data to stdout $ cat test |gzip -c > test.gz # Option - c is used to specify the output to stdout. This option can also be used with cpio $ ls * | cpio -o | gzip -c > cpiooutput.gz $ zcat cpiooutput.gz | cpio -it
We can specify the compression level of gzip. – The fast or – best option provides the lowest or highest compression ratio, respectively
$ gzip --fast test
Compressed archive
The first method
$ tar -czvvf archive.tar.gz [FILES] # perhaps $ tar -cavvf archive.tar.gz [FILES] # Option - z indicates that gzip is used for compression, and option - a indicates that the compression format is inferred from the file extension
The second method
# First, create a tar archive $ tar -cvvf archive.tar [FILES] # Compress tar Archive $ gzip archive.tar
The following command can extract the contents of the archive file compressed by gzip
$ tar -xavvf archive.tar.gz -C extract_directory
zcat -- directly read gzip format files
The zcat command can output the contents of the. gz file to stdout without decompression gz files do not change
$ ls test.gz test.gz $ zcat test.gz coco $ ls test.gz test.gz
compression ratio
We can specify the compression rate, which has 9 levels, of which
- Level 1 has the lowest compression rate, but the fastest compression speed
- Level 9 has the highest compression rate, but the slowest compression speed
You can specify the compression ratio as follows
$ gzip -5 test.img
Using bzip2
Bzip2 is similar to gzip in function and syntax. The difference is that bzip2 is more efficient than gzip, but takes longer than gzip
- Compress with bzip2
$ bzip2 filename
- Unzip the file in bzip2 format
$ bunzip2 filename.bz2
- The method of generating tar.bz2 file and extracting content from it is similar to tar.gz described earlier
$ tar -xjvf archive.tar.bz2
Where, - j indicates that the archive file is compressed in bzip2 format
Using lzma
The compression ratio of lzma is better than gzip and bzip2
- Compression using lzma
$ lzma filename
- Unzip lzma file
$ unlzma filename.lzma
- The generated tar archive can be compressed using the – lzma option
$ tar -cvvf --lzma archive.tar.lzma [FILES] # perhaps $ tar -cavvf archive.tar.lzma [FILES]
- Extract the contents of the lzma compressed tar archive into the specified directory
$ tar -xvvf --lzma archive.tar.lzma -C extract_directory # Where, - x is used to extract the content, and - lzma specifies to use lzma to extract the archive file # We can also use $ tar -xavvf archive.tar.lzma -C extract_directory
Archive and compress using zip
Create a zip archive
$ zip coco.zip file1 file2 file3 adding: file1 (stored 0%) adding: file2 (stored 0%) adding: file3 (stored 0%) $ ls coco* coco.zip $ zip file.zip file
Option - r enables recursive archiving of directories
$ zip -r picture.zip picture/ adding: picture/ (stored 0%) adding: picture/bmp/ (stored 0%) adding: picture/bmp/Untitled.bmp (deflated 21%) adding: picture/bmp/1.bmp (deflated 45%) adding: picture/bmp/Snack.bmp (deflated 46%) adding: picture/bmp/glass.bmp (deflated 55%) adding: picture/bmp/Untitled.bmp (deflated 60%) adding: picture/bmp/4.bmp (deflated 48%) ...
The unzip command extracts content from a ZIP file
$ unzip file.zip
unzip does not delete the file.zip after the extraction operation (unlike unlzma and gunzip)
Option - u updates the contents of the compressed archive
$ zip file.zip -u newfile
Option - d Deletes one or more files from the compressed archive
$ zip -d arc.zip file.txt
Option - l lists the contents of the compressed archive
$ unzip -l picture.zip Archive: picture.zip Length Date Time Name --------- ---------- ----- ---- 0 2021-11-11 16:57 picture/ 0 2021-11-03 13:35 picture/bmp/ 674094 2021-11-03 13:35 picture/bmp/Untitled.bmp 777838 2021-11-03 13:35 picture/bmp/1.bmp 1081554 2021-11-03 13:35 picture/bmp/Snack.bmp 682330 2021-11-03 13:35 picture/bmp/glass.bmp 750054 2021-11-03 13:35 picture/bmp/Untitled.bmp 921654 2021-11-03 13:35 picture/bmp/4.bmp 0 2021-11-03 13:35 picture/pic/ 325520 2021-11-03 13:35 picture/pic/qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqaaaaaaaaaaaaaaaaaaaaaaaaaZzzzzzzzzzzzzzzzzzzzzz.png 0 2021-11-03 13:35 picture/gif/ 150785 2021-11-03 13:35 picture/gif/Unnamed.gif 12450 2021-11-03 13:35 picture/gif/393020.gif ...
Faster archiving tool pbzip2
Most of the compression commands we've seen so far can only use a single processor core. The pbzip2, plzip, pigz and lrzip commands all use multithreading, which can reduce the time required to compress files with the help of multiple cores
These tools are not installed in most distributions. You can install them yourself using apt get or yum
sudo apt-get install pbzip2
Compress a single file
$ pbzip2 picture.zip
###To compress and archive multiple files or directories, you can use tar with pbzip2
$ tar cf coco.tar.bz2 --use-compress-program=pbzip2 for_fun/ picture/ $ ls -l coco.tar.bz2 -rw-rw-r-- 1 amlogic amlogic 59820017 11 July 28-17:16 coco.tar.bz2 # perhaps $ tar -c for_fun/ picture/ |pbzip2 -c > coco.tat.bz2
Extract from a file in pbzip2 format.
$ pbzip2 -d file1.bz2 # If it is a tar.bz2 file, we can use the pipeline to decompress and extract it $ pbzip2 -dc myfile.tar.bz2 | tar -x
Manually specify the number of processors
$ pbzip2 -p4 myfile.tar
The above command tells pbzip2 to use four processor cores
Specify compression ratio
From options - 1 to - 9, you can specify the fastest to best compression effect, where - 1 has the fastest compression speed and - 9 has the highest compression rate
Create a compressed file system
Squashfs program can create a read-only file system with ultra-high compression rate. It can compress 2GB~3GB data into a 700MB file. Linux LiveCD (or LiveUSB) was created using squashfs. This type of CD uses a read-only compressed file system to save the root file system in a compressed file. It can be mounted and loaded into a full linux environment using loopback. If you need some files, you can unzip them and load them into memory for use
All modern Linux distributions support mounting squashfs file systems. However, to create a squashfs file, you need to install squashfs tools using the package manager
$ sudo apt-get install squashfs-tools
Use the mksquashfs command to add source directories and files to create a squashfs file
$ mksquashfs SOURCES compressedfs.squashfs # SOURCES can be wildcards, file or directory paths. $ sudo mksquashfs /etc test.squashfs Parallel mksquashfs: Using 2 processors Creating 4.0 filesystem on test.squashfs, block size 131072. [=======================================] 1867/1867 100%
Mount squashfs files in loopback form
# mkdir /mnt/squash # mount -o loop compressedfs.squashfs /mnt/squash
Exclude some files when creating squashfs files
# Option - e can exclude some files and directories $ sudo mksquashfs /etc test.squashfs -e /etc/passwd /etc/shadow # Where the option - e is used to exclude the files / etc/passwd and / etc/shadow # You can also write a list of file names to be excluded to a file and specify the file with the option - ef $ cat excludelist /etc/passwd /etc/shadow $ sudo mksquashfs /etc test.squashfs -ef excludelist # If you want to use wildcards in the excluded files list, you need to use the - wildcard option
Backup system snapshots using rsync
Data backup needs to be completed regularly. In addition to backing up local files, remote data may also be involved. rsync can synchronize files and directories in different locations while minimizing the amount of data transfer. Compared with the cp command, rsync has the advantage of comparing file modification dates and copying only newer files. In addition, it supports remote data transmission, compression and encryption
Copy source directory to destination path
$ rsync -av source_path destination_path # for example $ rsync -av /home/slynux/data slynux@192.168.0.6:/home/backups/data
- -a indicates archiving
- -v (verbose) means printing out details or progress on stdout
The above command will recursively copy all files from the source path to the destination path. The source and destination paths can be either remote or local
Back up data to a remote server or host
$ rsync -av source_dir username@host:PATH
The following command can restore the data on the remote host to the local
$ rsync -av username@host:PATH destination
When transmitting through the network, compressed data can significantly improve the transmission efficiency. We can use rsync's option - z to specify that data is compressed during transmission
$ rsync -avz source destination
Synchronize the contents of one directory to another
$ rsync -av /home/test/ /home/backups
This command copies the contents of the source directory (/ home/test) (excluding the directory itself) to the existing backups directory
Copy content, including the directory itself, to another directory
$ rsync -av /home/test /home/backups
Exclude some files when archiving with rsync
The options – exclude and – exclude from specify files that do not need to be transferred
–exclude PATTERN
You can use wildcards to specify files to exclude
$ rsync -avz /home/code/app /mnt/disk/backup/code --exclude "*.o"
Or we can specify the files to be excluded through a list file.
This requires the use of – exclude from filepath
When updating rsync backups, delete nonexistent files
By default, rsync does not delete files on the destination side that no longer exist on the source side. If you want to delete such files, you can use rsync's – delete option
$ rsync -avz SOURCE DESTINATION --delete
Regular backup
You can create a cron task to back up regularly
$ crontab -ev # Add the following line: 0 */10 * * * rsync -avz /home/code user@IP_ADDRESS:/home/backups
The crontab item above schedules rsync to run every 10 hours
*/10 is in the hour position in crontab syntax, / 10 indicates that a backup is performed every 10 hours.
If * / 10 appears in the minutes position, the backup is performed every 10 minutes
Differentiated archiving
So far, the backup methods we have described are to completely copy the file system at that time. This method is useful if you can find the problem immediately and then use the latest snapshot to recover. However, if you do not find the problem in time until you make a new snapshot, and the previous correct data has been overwritten by the current wrong data, this method will not be useful.
The file system archive provides a history of document changes. If you need to return an earlier version of a damaged file, you can use it.
rsync, tar, and cpio can be used to take daily snapshots of file systems. But it's too expensive. Create an independent snapshot every day, and the storage space required in a week is 7 times that of the backup file system.
Differentiated backups only need to save files that have changed since the last full backup. The dump/restore tool in Unix supports this form of archive backup. Unfortunately, these tools are designed for tape devices, so they are not easy to use.
The find command can achieve the same function with tar or cpio
Use tar to create the first full backup
$ tar -cvz /backup/full.tgz /home/user # Use the - newer option of the find command to determine which files have changed since the last full backup, and then create a new archive $ tar -czf day-`date +%j`.tgz `find /home/user –newer /backup/full.tgz`
The find command generates a list of all files that have changed since the last full backup (/ backup/full.tgz) was created.
The date command generates a file name based on the Julian date. Therefore, the first differentiated backup of that year is day-1.tgz, the backup of January 2 is day-2.tgz, and so on.
Since more and more files will be changed after the first full backup, the daily differential archiving will become larger and larger. When the size of the return file exceeds the expectation, a new full backup needs to be made.
Create a full disk image using fsarchiver
Fsarchiver can save the contents of the entire disk partition into a compressed archive file. Unlike tar or cpio, fsarchiver can retain the extended attributes of files and can be used to restore the current file system to disk. It can recognize and retain the file attributes of Windows and Linux systems, so it is suitable for migrating Samba mounted partitions
Create file system / partition backup
$ fsarchiver savefs backup.fsa /dev/sda1 # backup.fsa is the final backup file and / dev/sda1 is the partition to be backed up
Backup multiple partitions at the same time
# Or use the savefs option to take multiple partitions as the last parameter of the fsarchiver $ fsarchiver savefs backup.fsa /dev/sda1 /dev/sda2
Restore partition from backup archive
# Use the restfs option of fsarchiver $ fsarchiver restfs backup.fsa id=0,dest=/dev/sda1 # id=0 indicates that we want to extract the contents of the first partition from the backup archive and restore it to the partition specified by dest=/ dev/sda1 # Restore multiple partitions from backup archive # As before, use the restfs option: # fsarchiver restfs backup.fsa id=0,dest=/dev/sda1 id=1,dest=/dev/sdb1