Big data | common API s for Java operation HDFS

Posted by riffy on Wed, 17 Nov 2021 07:49:02 +0100

The previous article introduced the common commands about HDFS. The common commands have corresponding APIs. The functions that can be completed with commands can also be completed with Java APIs. This article introduces the Java APIs commonly used in HDFS.

1, Review common commands

In the last article, I sorted out the commands commonly used in HDFS. Here is a brief review.

The ls command is used to view directories and files in the HDFS system. The commands are as follows:

$ hadoop fs -ls /

The put command is used to upload local files to the HDFS system. The commands are as follows:

$ hadoop fs -put test.txt /

The moveFromLocal command moves the local file to the HDFS file system and deletes the local file. The commands are as follows:

$ hadoop fs -moveFromLocal abc.txt /

The get command is used to download files from the HDFS file system to the local. The command is as follows:

$ hadoop fs -get /abc.txt /home/hadoop/

The rm command is used to delete files or folders in the HDFS system. The commands are as follows

$ hadoop fs -rm /test.txt

The mkdir command is used to create directories in the HDFS system. The commands are as follows:

$ hadoop fs -mkdir /test

The cp command is used for file replication in HDFS system. The commands are as follows:

$ hadoop fs -ls /

The mv command is used in the HDFS system to move files and rename files. The commands are as follows:

$ hadoop fs -mv /abc/abc.txt /test/
$ hadoop fs -mv /test/abc.txt /test/abcabc.txt

cat command is used to output the contents of a file in HDFS file system. The commands are as follows:

$ hadoop fs -cat /test/abcabc.txt

The appendToFile command appends the contents of a single or multiple files from the local system to the files of the HDFS system. The commands are as follows:

$ hadoop fs -appendToFile abc.txt /abc.txt

The above content briefly reviews the common commands of HDFS file system. Next, let's sort out the common Java API s of HDFS.

2, Introduce dependency

Using the Java API of HDFS, you can operate files in the HDFS file system, such as file creation, deletion, reading, etc. Create a Maven project and then introduce its dependencies. Even if the preparation is completed, the dependencies are as follows:

<dependency>
    <groupId>org.apache.hadoop</groupId>
    <artifactId>hadoop-client</artifactId>
    <version>2.8.2</version>
</dependency>

After introducing this dependency, use the file system tool class in the Java API of HDFS, which can complete our operations. Let's learn about it below.

3, File list

The code is very simple. Therefore, the code here is as follows:

public static void main(String[] args) throws IOException {
    Configuration conf = new Configuration();
    // Set HDFS access address
    conf.set("fs.default.name", "hdfs://centos01:9000");
    // Get FileSystem instance
    FileSystem fs = FileSystem.get(conf);
    List<String> filesUnderFolder = HdfsFileSystem.getFileList(fs, new Path("hdfs:/"));
    filesUnderFolder.forEach(System.out::println);
}

public static List<String> getFileList(FileSystem fs, Path folderPath) throws IOException {
    List<String> paths = new ArrayList();

    if (fs.exists(folderPath)) {
        FileStatus[] fileStatus = fs.listStatus(folderPath);

        for (int i = 0; i < fileStatus.length; i++) {
            FileStatus fileStatu = fileStatus[i];
            paths.add(fileStatu.getPath().toString());
        }
    }

    return paths;
}

In the above code, in the custom method getFileList, all files and directories under the HDFS file system / directory are returned through the listStatus() method of the FileSystem class. The output content is as follows:

hdfs://centos01:9000/abc
hdfs://centos01:9000/abc.txt
hdfs://centos01:9000/depInput
hdfs://centos01:9000/depOutput
hdfs://centos01:9000/input
hdfs://centos01:9000/output
hdfs://centos01:9000/scoreInput
hdfs://centos01:9000/scoreOutput
hdfs://centos01:9000/secondInput
hdfs://centos01:9000/secondOutput
hdfs://centos01:9000/test
hdfs://centos01:9000/tmp

The above output is the list of files in HDFS in my virtual machine. If you need to display the list of files and directories in other directories, just pass in the corresponding path. If you want to display all the files, you only need to judge whether it is a directory. If it is a directory, you can call it directly recursively.

4, Create directory

To create a directory, use the mkdirs method of the FileSystem class. The code is as follows:

public static void main(String[] args) throws IOException {
    String path = "hdfs:/hdfsDir";
    HdfsFileSystem.createDir(path);
}

/**
 * Create HDFS directory mydir
 */
public static void createDir(String pathString) throws IOException {
    Configuration conf = new Configuration();
    conf.set("fs.default.name", "hdfs://centos01:9000");
    org.apache.hadoop.fs.FileSystem fs = org.apache.hadoop.fs.FileSystem.get(conf);
    // Create directory
    boolean created = fs.mkdirs(new Path(pathString));
    if (created) {
        System.out.println("Directory created successfully");
    } else {
        System.out.println("Failed to create directory");
    }
    fs.close();
}

The return type of mkdirs method is boolean. Returning true indicates that the creation was successful, and returning false indicates that the creation failed. Use the command of HDFS to view. The command is as follows:

$ hadoop fs -ls / | grep hdfsDir
drwxr-xr-x   - Administrator supergroup          0 2021-11-12 10:09 /hdfsDir

You can see that the / hdfsDir directory was created successfully.

5, File creation

The file can be created by using the create method of the FileSystem class. The code is as follows:

public static void main(String[] args) throws IOException {
    String path = "hdfs:/fileAbc.txt";
    String context = "1234";
    HdfsFileSystem.createFile(path, context);
}

/**
 * Define how to create a file
 */
public static void createFile(String pathString, String context) throws IOException {
    Configuration conf = new Configuration();
    conf.set("fs.default.name", "hdfs://centos01:9000");
    FileSystem fs = FileSystem.get(conf);
    // Open an output stream
    FSDataOutputStream outputStream = fs.create(new Path(pathString));
    // Write file contents
    outputStream.write(context.getBytes());
    outputStream.close();
    fs.close();
    System.out.println("File created successfully");
}

The above code creates a file named fileAbc.txt in the root directory of HDFS and writes 1234 to the file. Use the command to check whether the file is created successfully and whether the content is written successfully. The command is as follows:

$ hadoop fs -ls / | grep fileAbc
-rw-r--r--   3 Administrator supergroup          4 2021-11-12 10:17 /fileAbc.txt
$ hadoop fs -cat /fileAbc.txt
1234

6, Output of file contents

The output of the file content can be read using the input stream using the open method of the FileSystem class. The code is as follows:

public static void main(String[] args) throws IOException {
    String path = "hdfs:/fileAbc.txt";
    HdfsFileSystem.fileSystemCat(path);
}

/**
 * Query HDFS file content and output
 */
public static void fileSystemCat(String pathString) throws IOException {
    Configuration conf = new Configuration();
    // Set HDFS access address
    conf.set("fs.default.name", "hdfs://centos01:9000");
    // Get FileSystem instance
    FileSystem fs = FileSystem.get(conf);
    // Open file input stream
    InputStream in = fs.open(new Path(pathString));
    // Output file content
    IOUtils.copyBytes(in, System.out, 4096, false);
    // Close input stream
    IOUtils.closeStream(in);
}

Run the code to see that the contents of the file are output as follows:

1234

7, Delete file

The file is deleted using the deleteOnExit method of the FileSystem class. The code is as follows:

public static void main(String[] args) throws IOException {
    String path = "hdfs:/fileAbc.txt";
    HdfsFileSystem.deleteFile(path);
}

/**
 * Delete file
 */
public static void deleteFile(String pathString) throws IOException {
    Configuration conf = new Configuration();
    conf.set("fs.default.name", "hdfs://centos01:9000");
    FileSystem fs = FileSystem.get(conf);
    Path path = new Path(pathString);
    // Delete file
    boolean isok = fs.deleteOnExit(path);

    if (isok) {
        System.out.println("Delete succeeded");
    } else {
        System.out.println("Deletion failed");
    }

    fs.close();
}

View through the command. The file has been deleted.

8, Read metadata of file / directory

The metadata of the file / directory can be read by using the getFileStatus method of the FileSystem class. The code is as follows:

public static void main(String[] args) throws IOException {
    String path = "hdfs:/fileAbc.txt";
    String context = "1234";
    HdfsFileSystem.createFile(path, context);
    HdfsFileSystem.fileStatusCat(path);
}

/**
 * Get metadata information of file or directory
 */
public static void fileStatusCat(String pathString) throws IOException {
    // Create Configuration object
    Configuration conf = new Configuration();
    // Set HDFS access address
    conf.set("fs.default.name", "hdfs://centos01:9000");
    // Get FileSystem instance
    FileSystem fs = FileSystem.get(conf);
    FileStatus fileStatus = fs.getFileStatus(new Path(pathString));

    // Determine whether it is a folder or a file
    if (fileStatus.isDirectory()) {
        System.out.println("This is a folder");
    } else {
        System.out.println("This is a file");
    }

    // Output metadata information
    System.out.println("File path:" + fileStatus.getPath());
    System.out.println("Document modification date:" + new Timestamp(fileStatus.getModificationTime()).toString());
    System.out.println("File last accessed:" + new Timestamp(fileStatus.getAccessTime()).toString());
    System.out.println("File length:" + fileStatus.getLen());
    System.out.println("Number of file backups:" + fileStatus.getReplication());
    System.out.println("File block size:" + fileStatus.getBlockSize());
    System.out.println("File owner:" + fileStatus.getOwner());
    System.out.println("File group:" + fileStatus.getGroup());
    System.out.println("File permissions:" + fileStatus.getPermission().toString());
}

Various information of the file can be obtained through FileStatus. The above output is as follows:

This is a file
 File path: hdfs://centos01:9000/fileAbc.txt
 Document modification date: 2021-11-12 11:02:12.797
 File last accessed: 2021-11-12 11:02:12.438
 File length: 4
 Number of file backups: 3
 File block size: 134217728
 File owner: Administrator
 File group: supergroup
 File permissions: rw-r--r--

Here, we get the file path, modification date, last access date, file length and other information.

9, Upload local files to HDFS

The file upload can be completed by using copyFromLocalFile of FileSystem class. The code is as follows:

public static void main(String[] args) throws IOException {
    HdfsFileSystem.uploadFileToHDFS("d:/mysql.docx", "hdfs:/");
}

/**
 * Upload local files to HDFS
 */
public static void uploadFileToHDFS(String srcPath, String dstPath) throws IOException {
    // Create Configurator
    Configuration conf = new Configuration();
    conf.set("fs.default.name", "hdfs://centos01:9000");
    // Get FileSystem instance
    FileSystem fs = FileSystem.get(conf);
    // Create a file system path that can be used by hadoop
    // Local directory / file
    Path src = new Path(srcPath);
    // HDFS directory / file
    Path dst = new Path(dstPath);
    // Copy and upload Wendi files to HDFS file system
    fs.copyFromLocalFile(src, dst);
    System.out.println("File uploaded successfully");
}

Use the command to view the upload. The command is as follows:

$ hadoop fs -ls / | grep mysql
-rw-r--r--   3 Administrator supergroup    1470046 2021-11-12 11:06 /mysql.docx

10, Download HDFS files locally

Download the HDFS file locally and use the copyToLocalFile method of the FileSystem class. The code is as follows:

public static void main(String[] args) throws IOException {
    HdfsFileSystem.downloadFileToLocal("hdfs:/mysql.docx", "d:/test.docx");
}

/**
 * Download files locally
 */
public static void downloadFileToLocal(String srcPath, String dstPath) throws IOException {
    // Create Configurator
    Configuration conf = new Configuration();
    conf.set("fs.default.name", "hdfs://centos01:9000");
    // Get FileSystem instance
    FileSystem fs = FileSystem.get(conf);
    // Create a file system path that can be used by hadoop
    Path src = new Path(srcPath);
    Path dst = new Path(dstPath);
    // Copy and download files from HDFS file system to local
    fs.copyToLocalFile(false, src, dst, true);
    System.out.println("File downloaded successfully");
}

Go to the local disk D for viewing. The file has been downloaded successfully.

11, Summary

HDFS is the core module of Hadoop project. Using HDFS Java API to operate HDFS is very convenient and simple. Have you noticed that HDFS, as a file system, can add, append, delete and view files, but it has no function to modify files. This is a feature that distinguishes HDFS from other file systems. What is the purpose of the author of HDFS? Save it for later discussion!