Hadoop3. Upload file of X HDFS shell operation

Posted by lkalik on Wed, 22 Dec 2021 03:37:58 +0100

1. copyFromLocal

$ hadoop fs -help copyFromLocal
-copyFromLocal [-f] [-p] [-l] [-d] [-t <thread count>] <localsrc> ... <dst> :
  Copy files from the local file system into fs. Copying fails if the file already
  exists, unless the -f flag is given.
  Flags:

  -p                 Preserves access and modification times, ownership and the
                     mode.
  -f                 Overwrites the destination if it already exists.
  -t <thread count>  Number of threads to be used, default is 1.
  -l                 Allow DataNode to lazily persist the file to disk. Forces
                     replication factor of 1. This flag will result in reduced
                     durability. Use with care.
  -d                 Skip creation of temporary file(<dst>._COPYING_).

Upload the local file copy to HFDFS. Specify the - f option when the file exists, otherwise the copy will fail.

The following code will create a txt file, create a directory on HDFS (similar to the Linux mkdir command), and then upload the file to this directory

mkdir weiguo
echo caocao >> weiguo/caocao.txt
hadoop fs -mkdir /weiguo
hadoop fs -copyFromLocal weiguo/caocao.txt /weiguo

On the Web page, enter the specified directory and click Caocao Txt, click Head the file to view the first 32K contents of the file

By default, the access permission, owner, group, modification time and other information of the file will not be reserved. If necessary, use the - p option, which means preserve reservation

echo guojia >> weiguo/guojia.txt
chmod 777 weiguo/guojia.txt
hadoop fs -copyFromLocal -p weiguo/guojia.txt /weiguo

If the file on HDFS already exists, you can overwrite it with the - f option

hadoop fs -copyFromLocal -p -f weiguo/guojia.txt /weiguo

-The t option can specify the number of threads to use. The default is 1. It is very useful when uploading directories containing more than one file.

hadoop fs -copyFromLocal -p -f -t 5 weiguo/guojia.txt /weiguo

Using the - l option will allow the DN to delay persistence of files to the hard disk, forcing the number of copies to be 1 This option will result in reduced file durability. Use with caution.

echo xunyu >> weiguo/xunyu.txt
hadoop fs -copyFromLocal -l weiguo/xunyu.txt /weiguo

-The d option can skip creating a temporary file. What is this temporary file and where is it? I don't understand yet. I'm in doubt.

echo xuhuang >> weiguo/xuhuang.txt
hadoop fs -copyFromLocal -d weiguo/xuhuang.txt /weiguo

The summary is as follows:

optiondescribe
-pRetain access and modification time, ownership and mode
-fIf the target already exists, overwrite the target
-t The number of threads used. The default is 1
-lAllow DN to delay persistence of files to hard disk. The number of forced copies is 1. This option will result in reduced durability. Use with caution.
-dSkip creating temporary files

2. put

$ hadoop fs -help put
-put [-f] [-p] [-l] [-d] <localsrc> ... <dst> :
  Copy files from the local file system into fs. Copying fails if the file already
  exists, unless the -f flag is given.
  Flags:

  -p  Preserves access and modification times, ownership and the mode.
  -f  Overwrites the destination if it already exists.
  -l  Allow DataNode to lazily persist the file to disk. Forces
         replication factor of 1. This flag will result in reduced
         durability. Use with care.

  -d  Skip creation of temporary file(<dst>._COPYING_).

It's similar to the copyFromLocal command above, but it can't start multi-threaded upload, but put is often used in shell operations.

3. moveFromLocal

$ hadoop fs -help moveFromLocal
-moveFromLocal <localsrc> ... <dst> :
  Same as -put, except that the source is deleted after it's copied.

Like the - put copyFromLocal command, local files are uploaded to HDFS. The difference is that local files will be deleted.

echo zhangliao.txt >> weiguo/zhangliao.txt
hadoop fs -moveFromLocal weiguo/zhangliao.txt /weiguo

4. appendToFile

$ hadoop fs -help appendToFile
-appendToFile <localsrc> ... <dst> :
  Appends the contents of all the given local files to the given dst file. The dst
  file will be created if it does not exist. If <localSrc> is -, then the input is
  read from stdin.

Appends all specified local files to the end of the specified HDFS file.

mkdir shuguo
echo liu >> shuguo/liu.txt
echo bei >> shuguo/bei.txt
hadoop fs -mkdir /shuguo
hadoop fs -put shuguo/liu.txt /shuguo/liubei.txt

The original Liubei Txt file:

hadoop fs -appendToFile shuguo/bei.txt /shuguo/liubei.txt

Liubei. After appending txt:

If the HDFS file does not exist, a new one will be created. If the specified local file is -, it will be read from the standard input stdin.

hadoop fs -appendToFile - /shuguo/zhaoyun.txt
zhaoyun

CTRL + C to launch

Document content:

Topics: Hadoop