The previous article introduced the common commands about HDFS. The common commands have corresponding APIs. The functions that can be completed with commands can also be completed with Java APIs. This article introduces the Java APIs commonly used in HDFS.
1, Review common commands
In the last article, I sorted out the commands commonly used in HDFS. Here is a brief review.
The ls command is used to view directories and files in the HDFS system. The commands are as follows:
$ hadoop fs -ls /
The put command is used to upload local files to the HDFS system. The commands are as follows:
$ hadoop fs -put test.txt /
The moveFromLocal command moves the local file to the HDFS file system and deletes the local file. The commands are as follows:
$ hadoop fs -moveFromLocal abc.txt /
The get command is used to download files from the HDFS file system to the local. The command is as follows:
$ hadoop fs -get /abc.txt /home/hadoop/
The rm command is used to delete files or folders in the HDFS system. The commands are as follows
$ hadoop fs -rm /test.txt
The mkdir command is used to create directories in the HDFS system. The commands are as follows:
$ hadoop fs -mkdir /test
The cp command is used for file replication in HDFS system. The commands are as follows:
$ hadoop fs -ls /
The mv command is used in the HDFS system to move files and rename files. The commands are as follows:
$ hadoop fs -mv /abc/abc.txt /test/ $ hadoop fs -mv /test/abc.txt /test/abcabc.txt
cat command is used to output the contents of a file in HDFS file system. The commands are as follows:
$ hadoop fs -cat /test/abcabc.txt
The appendToFile command appends the contents of a single or multiple files from the local system to the files of the HDFS system. The commands are as follows:
$ hadoop fs -appendToFile abc.txt /abc.txt
The above content briefly reviews the common commands of HDFS file system. Next, let's sort out the common Java API s of HDFS.
2, Introduce dependency
Using the Java API of HDFS, you can operate files in the HDFS file system, such as file creation, deletion, reading, etc. Create a Maven project and then introduce its dependencies. Even if the preparation is completed, the dependencies are as follows:
<dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-client</artifactId> <version>2.8.2</version> </dependency>
After introducing this dependency, use the file system tool class in the Java API of HDFS, which can complete our operations. Let's learn about it below.
3, File list
The code is very simple. Therefore, the code here is as follows:
public static void main(String[] args) throws IOException { Configuration conf = new Configuration(); // Set HDFS access address conf.set("fs.default.name", "hdfs://centos01:9000"); // Get FileSystem instance FileSystem fs = FileSystem.get(conf); List<String> filesUnderFolder = HdfsFileSystem.getFileList(fs, new Path("hdfs:/")); filesUnderFolder.forEach(System.out::println); } public static List<String> getFileList(FileSystem fs, Path folderPath) throws IOException { List<String> paths = new ArrayList(); if (fs.exists(folderPath)) { FileStatus[] fileStatus = fs.listStatus(folderPath); for (int i = 0; i < fileStatus.length; i++) { FileStatus fileStatu = fileStatus[i]; paths.add(fileStatu.getPath().toString()); } } return paths; }
In the above code, in the custom method getFileList, all files and directories under the HDFS file system / directory are returned through the listStatus() method of the FileSystem class. The output content is as follows:
hdfs://centos01:9000/abc hdfs://centos01:9000/abc.txt hdfs://centos01:9000/depInput hdfs://centos01:9000/depOutput hdfs://centos01:9000/input hdfs://centos01:9000/output hdfs://centos01:9000/scoreInput hdfs://centos01:9000/scoreOutput hdfs://centos01:9000/secondInput hdfs://centos01:9000/secondOutput hdfs://centos01:9000/test hdfs://centos01:9000/tmp
The above output is the list of files in HDFS in my virtual machine. If you need to display the list of files and directories in other directories, just pass in the corresponding path. If you want to display all the files, you only need to judge whether it is a directory. If it is a directory, you can call it directly recursively.
4, Create directory
To create a directory, use the mkdirs method of the FileSystem class. The code is as follows:
public static void main(String[] args) throws IOException { String path = "hdfs:/hdfsDir"; HdfsFileSystem.createDir(path); } /** * Create HDFS directory mydir */ public static void createDir(String pathString) throws IOException { Configuration conf = new Configuration(); conf.set("fs.default.name", "hdfs://centos01:9000"); org.apache.hadoop.fs.FileSystem fs = org.apache.hadoop.fs.FileSystem.get(conf); // Create directory boolean created = fs.mkdirs(new Path(pathString)); if (created) { System.out.println("Directory created successfully"); } else { System.out.println("Failed to create directory"); } fs.close(); }
The return type of mkdirs method is boolean. Returning true indicates that the creation was successful, and returning false indicates that the creation failed. Use the command of HDFS to view. The command is as follows:
$ hadoop fs -ls / | grep hdfsDir drwxr-xr-x - Administrator supergroup 0 2021-11-12 10:09 /hdfsDir
You can see that the / hdfsDir directory was created successfully.
5, File creation
The file can be created by using the create method of the FileSystem class. The code is as follows:
public static void main(String[] args) throws IOException { String path = "hdfs:/fileAbc.txt"; String context = "1234"; HdfsFileSystem.createFile(path, context); } /** * Define how to create a file */ public static void createFile(String pathString, String context) throws IOException { Configuration conf = new Configuration(); conf.set("fs.default.name", "hdfs://centos01:9000"); FileSystem fs = FileSystem.get(conf); // Open an output stream FSDataOutputStream outputStream = fs.create(new Path(pathString)); // Write file contents outputStream.write(context.getBytes()); outputStream.close(); fs.close(); System.out.println("File created successfully"); }
The above code creates a file named fileAbc.txt in the root directory of HDFS and writes 1234 to the file. Use the command to check whether the file is created successfully and whether the content is written successfully. The command is as follows:
$ hadoop fs -ls / | grep fileAbc -rw-r--r-- 3 Administrator supergroup 4 2021-11-12 10:17 /fileAbc.txt $ hadoop fs -cat /fileAbc.txt 1234
6, Output of file contents
The output of the file content can be read using the input stream using the open method of the FileSystem class. The code is as follows:
public static void main(String[] args) throws IOException { String path = "hdfs:/fileAbc.txt"; HdfsFileSystem.fileSystemCat(path); } /** * Query HDFS file content and output */ public static void fileSystemCat(String pathString) throws IOException { Configuration conf = new Configuration(); // Set HDFS access address conf.set("fs.default.name", "hdfs://centos01:9000"); // Get FileSystem instance FileSystem fs = FileSystem.get(conf); // Open file input stream InputStream in = fs.open(new Path(pathString)); // Output file content IOUtils.copyBytes(in, System.out, 4096, false); // Close input stream IOUtils.closeStream(in); }
Run the code to see that the contents of the file are output as follows:
1234
7, Delete file
The file is deleted using the deleteOnExit method of the FileSystem class. The code is as follows:
public static void main(String[] args) throws IOException { String path = "hdfs:/fileAbc.txt"; HdfsFileSystem.deleteFile(path); } /** * Delete file */ public static void deleteFile(String pathString) throws IOException { Configuration conf = new Configuration(); conf.set("fs.default.name", "hdfs://centos01:9000"); FileSystem fs = FileSystem.get(conf); Path path = new Path(pathString); // Delete file boolean isok = fs.deleteOnExit(path); if (isok) { System.out.println("Delete succeeded"); } else { System.out.println("Deletion failed"); } fs.close(); }
View through the command. The file has been deleted.
8, Read metadata of file / directory
The metadata of the file / directory can be read by using the getFileStatus method of the FileSystem class. The code is as follows:
public static void main(String[] args) throws IOException { String path = "hdfs:/fileAbc.txt"; String context = "1234"; HdfsFileSystem.createFile(path, context); HdfsFileSystem.fileStatusCat(path); } /** * Get metadata information of file or directory */ public static void fileStatusCat(String pathString) throws IOException { // Create Configuration object Configuration conf = new Configuration(); // Set HDFS access address conf.set("fs.default.name", "hdfs://centos01:9000"); // Get FileSystem instance FileSystem fs = FileSystem.get(conf); FileStatus fileStatus = fs.getFileStatus(new Path(pathString)); // Determine whether it is a folder or a file if (fileStatus.isDirectory()) { System.out.println("This is a folder"); } else { System.out.println("This is a file"); } // Output metadata information System.out.println("File path:" + fileStatus.getPath()); System.out.println("Document modification date:" + new Timestamp(fileStatus.getModificationTime()).toString()); System.out.println("File last accessed:" + new Timestamp(fileStatus.getAccessTime()).toString()); System.out.println("File length:" + fileStatus.getLen()); System.out.println("Number of file backups:" + fileStatus.getReplication()); System.out.println("File block size:" + fileStatus.getBlockSize()); System.out.println("File owner:" + fileStatus.getOwner()); System.out.println("File group:" + fileStatus.getGroup()); System.out.println("File permissions:" + fileStatus.getPermission().toString()); }
Various information of the file can be obtained through FileStatus. The above output is as follows:
This is a file File path: hdfs://centos01:9000/fileAbc.txt Document modification date: 2021-11-12 11:02:12.797 File last accessed: 2021-11-12 11:02:12.438 File length: 4 Number of file backups: 3 File block size: 134217728 File owner: Administrator File group: supergroup File permissions: rw-r--r--
Here, we get the file path, modification date, last access date, file length and other information.
9, Upload local files to HDFS
The file upload can be completed by using copyFromLocalFile of FileSystem class. The code is as follows:
public static void main(String[] args) throws IOException { HdfsFileSystem.uploadFileToHDFS("d:/mysql.docx", "hdfs:/"); } /** * Upload local files to HDFS */ public static void uploadFileToHDFS(String srcPath, String dstPath) throws IOException { // Create Configurator Configuration conf = new Configuration(); conf.set("fs.default.name", "hdfs://centos01:9000"); // Get FileSystem instance FileSystem fs = FileSystem.get(conf); // Create a file system path that can be used by hadoop // Local directory / file Path src = new Path(srcPath); // HDFS directory / file Path dst = new Path(dstPath); // Copy and upload Wendi files to HDFS file system fs.copyFromLocalFile(src, dst); System.out.println("File uploaded successfully"); }
Use the command to view the upload. The command is as follows:
$ hadoop fs -ls / | grep mysql -rw-r--r-- 3 Administrator supergroup 1470046 2021-11-12 11:06 /mysql.docx
10, Download HDFS files locally
Download the HDFS file locally and use the copyToLocalFile method of the FileSystem class. The code is as follows:
public static void main(String[] args) throws IOException { HdfsFileSystem.downloadFileToLocal("hdfs:/mysql.docx", "d:/test.docx"); } /** * Download files locally */ public static void downloadFileToLocal(String srcPath, String dstPath) throws IOException { // Create Configurator Configuration conf = new Configuration(); conf.set("fs.default.name", "hdfs://centos01:9000"); // Get FileSystem instance FileSystem fs = FileSystem.get(conf); // Create a file system path that can be used by hadoop Path src = new Path(srcPath); Path dst = new Path(dstPath); // Copy and download files from HDFS file system to local fs.copyToLocalFile(false, src, dst, true); System.out.println("File downloaded successfully"); }
Go to the local disk D for viewing. The file has been downloaded successfully.
11, Summary
HDFS is the core module of Hadoop project. Using HDFS Java API to operate HDFS is very convenient and simple. Have you noticed that HDFS, as a file system, can add, append, delete and view files, but it has no function to modify files. This is a feature that distinguishes HDFS from other file systems. What is the purpose of the author of HDFS? Save it for later discussion!