Detailed use of HDFS

Posted by alcedema on Sat, 25 Dec 2021 17:56:36 +0100

HDFS

1. Shell operation

upload

  • -moveFromLocal: cut and paste from local to HDFS
    • hadoop fs -moveFromLocal local file HDFS directory
  • -Copy from local: copy files from the local file system to the HDFS path
    • hadoop fs -copyFromLocal local file HDFS directory
  • -Put: equivalent to copyFromLocal, the production environment is more used to put
    • hadoop fs -put local file HDFS directory
  • -appendToFile: appends a file to the end of an existing file
    • hadoop fs -appendToFile local file HDFS directory



download

  • -copyToLocal: copy from HDFS to local
    • hadoop fs -copyToLocal HDFS directory file local directory
  • -Get: equivalent to copyToLocal. The production environment is more used to get
    • hadoop fs -get HDFS directory file local directory



Direct operation (same as Linux command function)

  • -ls: display directory information
  • -cat: display file contents
  • -chmod, - chown: the same as in Linux file system, modify the permissions of the file
  • -mkdir: create path
  • -cp: copy from one path of HDFS to another path of HDFS
  • -mv: move files in HDFS directory
  • -tail: displays data at the end 1kb of a file
  • -rm: delete a file or folder
  • -rm -r: recursively delete the directory and its contents
  • -du: Statistics of folder size information
    • hadoop fs -du -s -h HDFS directory (list the size information of the directory)
    • hadoop fs -du -h HDFS directory (list the file size information in the directory)
  • -setrep: set the number of copies of the file in HDFS (set the number of copies of the file to 10)
    • hadoop fs -setrep 10 HDFS directory file




2. API operation

preparation

  • Add dependency: create Maven project and add dependency (pom.xml)

    <dependency>
        <!-- The version number is the same as your own Hadoop Version correspondence -->
        <groupId>org.apache.hadoop</groupId>
        <artifactId>hadoop-client</artifactId>
        <version>3.1.4</version>
    </dependency>
    <dependency>
        <groupId>junit</groupId>
        <artifactId>junit</artifactId>
        <version>4.12</version>
    </dependency>
    <dependency>
        <groupId>org.slf4j</groupId>
        <artifactId>slf4j-log4j12</artifactId>
        <version>1.7.30</version>
    </dependency>
    
  • Add log: log4j properties

    log4j.rootLogger=INFO, stdout 
    log4j.appender.stdout=org.apache.log4j.ConsoleAppender 
    log4j.appender.stdout.layout=org.apache.log4j.PatternLayout 
    log4j.appender.stdout.layout.ConversionPattern=%d %p [%c] - %m%n 
    log4j.appender.logfile=org.apache.log4j.FileAppender 
    log4j.appender.logfile.File=target/spring.log 
    log4j.appender.logfile.layout=org.apache.log4j.PatternLayout 
    log4j.appender.logfile.layout.ConversionPattern=%d %p [%c] - %m%n
    



File upload

  • FileSystem.copyFromLocalFile(...)

    @Test
    public void testCopyFromLocalFile() throws IOException, InterruptedException, URISyntaxException {
        // 1 get file system
        Configuration configuration = new Configuration();
        //(URI) and (String user) parameters can be modified according to their own configuration
        FileSystem fs = FileSystem.get(new URI("hdfs://192.168.37.151:8020"), configuration, "root");
    
        // 2 upload files
        fs.copyFromLocalFile(new Path("D:\\fzk.txt"), new Path("/fzk"));
    
        // 3 close resources
        fs.close();
    }
    



File download

  • FileSystem.copyToLocalFile(...)

    @Test
    public void testCopyToLocalFile() throws IOException, InterruptedException, URISyntaxException {
        // 1 get file system
        Configuration configuration = new Configuration();
        //(URI) and (String user) parameters can be modified according to their own configuration
        FileSystem fs = FileSystem.get(new URI("hdfs://192.168.37.151:8020"), configuration, "root");
    
        // 2. Perform the download operation	
        // boolean delSrc indicates whether to delete the original file
        // Path src refers to the path of the file to download
        // Path dst refers to the path to which the file is downloaded
        // boolean useRawLocalFileSystem whether to enable file verification
        fs.copyToLocalFile(false, new Path("/xiyou/huaguoshan/sunwukong.txt"), new Path("d:/sunwukong2.txt"), true);
    
        // 3 close resources
        fs.close();
    }
    



Modify file name

  • FileSystem.rename(...)

    @Test
    public void testRename() throws IOException, InterruptedException, URISyntaxException{
        // 1 get file system
        Configuration configuration = new Configuration();
        //(URI) and (String user) parameters can be modified according to their own configuration
        FileSystem fs = FileSystem.get(new URI("hdfs://192.168.37.151:8020"), configuration, "root");
        
        // 2. Modify the file name
        fs.rename(new Path("/xiyou/sunwukong.txt"), new Path("/xiyou/meihouwang.txt"));
        
        // 3 close resources
        fs.close();
    }
    



Delete files and directories

  • FileSystem.delete(...)

    @Test
    public void testDelete() throws IOException, InterruptedException, URISyntaxException{
        // 1 get file system
        Configuration configuration = new Configuration();
        //(URI) and (String user) parameters can be modified according to their own configuration
        FileSystem fs = FileSystem.get(new URI("hdfs://192.168.37.151:8020"), configuration, "root");
        
        // 2 execute deletion
        fs.delete(new Path("/xiyou"), true);
        
        // 3 close resources
        fs.close();
    }
    



Document details view

  • View file name, permission, length and block information

    @Test
    public void testListFiles() throws IOException, InterruptedException, URISyntaxException {
        // 1 get file system
        Configuration configuration = new Configuration();
        //(URI) and (String user) parameters can be modified according to their own configuration
        FileSystem fs = FileSystem.get(new URI("hdfs://192.168.37.151:8020"), configuration, "root");
    
        // 2 obtain document details
        RemoteIterator<LocatedFileStatus> listFiles = fs.listFiles(new Path("/"), true);
        while (listFiles.hasNext()) {
            LocatedFileStatus fileStatus = listFiles.next();
            System.out.println("========" + fileStatus.getPath() + "=========");
            System.out.println(fileStatus.getPermission());  //jurisdiction
            System.out.println(fileStatus.getOwner());  //Owner
            System.out.println(fileStatus.getGroup());  //group
            System.out.println(fileStatus.getLen());  //length
            System.out.println(fileStatus.getModificationTime());  //Modification time
            System.out.println(fileStatus.getReplication());  //Number of copies stored in the file
            System.out.println(fileStatus.getBlockSize());  //Block size 
            System.out.println(fileStatus.getPath().getName());  //name
            // Get block information
            BlockLocation[] blockLocations = fileStatus.getBlockLocations();
            System.out.println(Arrays.toString(blockLocations));  //Block where the file is located
        }
    
        // 3 close resources
        fs.close();
    }
    



File and folder judgment

  • FileSystem.isFile()

    @Test
    public void testListStatus() throws IOException, InterruptedException, URISyntaxException {
        // 1 get file configuration information
        Configuration configuration = new Configuration();
        //(URI) and (String user) parameters can be modified according to their own configuration
        FileSystem fs = FileSystem.get(new URI("hdfs://192.168.37.151:8020"), configuration, "root");
    
        // 2 determine whether it is a file or a folder
        FileStatus[] listStatus = fs.listStatus(new Path("/"));
        for (FileStatus fileStatus : listStatus) {
            // If it is a file
            if (fileStatus.isFile()) {
                System.out.println("file:" + fileStatus.getPath().getName());
            } else {
                System.out.println("catalogue:" + fileStatus.getPath().getName());
            }
        }
    
        // 3 close resources
        fs.close();
    }
    



Modify parameter method

  • 1. Value set in client code: configuration set(key, value)

    // 1 get file system
    Configuration configuration = new Configuration();
    //Set DFS The number of replications is 2
    configuration.set("dfs.replication", "2");  
    FileSystem fs = FileSystem.get(new URI("hdfs://192.168.37.151:8020"), configuration, "root");
    
  • 2. User defined profile under ClassPath

    • HDFS site Copy the XML to the resources directory of the project

      <?xml version="1.0" encoding="UTF-8"?>
      <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
      
      <configuration>
          <!-- set up dfs.replication The number of is 1 -->
          <property>
              <name>dfs.replication</name>
              <value>1</value>
          </property>
      </configuration>
      
  • 3. Then there is the custom configuration of the server (xxx site. XML)

  • 4. Default configuration of the server (XXX default. XML)



Parameter priority

  • Priority from high to low
    • The value set in the client code
    • User defined profile under ClassPath
    • Then there is the custom configuration of the server (xxx site. XML)
    • Default configuration of the server (XXX default. XML)

Topics: Big Data Hadoop