Big data Hadoop 3 1.3 detailed introduction notes of HDFS

On the right side of the page, there is a directory index, which can jump to the content you want to see according to the titleIf not on the right, look for the left Main article link https://blog.csdn.net/grd_java/article/details/115639179Chapter I: environmental construction https://blog.csdn.net/grd_java/article/details/115693312 If you hav ...

Posted by algarve4me on Sat, 05 Mar 2022 05:22:59 +0100

Hadoop 07: introduction to SecondaryNameNode and DataNode

1, Introduction to SecondaryNameNode When analyzing the edits log files just now, we have introduced the SecondaryNameNode. Here is a summary to show our attention. The secondary namenode is mainly responsible for regularly merging the contents of the edits file into the fsimage This merge operation is called checkpoint. When merging, the co ...

Posted by abselect on Wed, 02 Mar 2022 01:47:02 +0100

HDFS high availability architecture

First, we need to build three virtual machines (here is a demonstration). For the construction process, please refer to this article. Using virtual machine to complete Hadoop fully distributed construction_ You can read your own blog - CSDN blog After completing the Hadoop fully distributed construction in the previous article, you can do the ...

Posted by antileon on Sat, 26 Feb 2022 14:10:24 +0100

Hive tutorial (06) - Hive SerDe serialization and deserialization

01 introduction In the previous tutorial, you have a preliminary understanding of Hive's data model, data types and operation commands. Interested students can refer to: Hive tutorial (01) - getting to know hiveHive tutorial (02) - hive installationHive tutorial (03) - hive data modelHive tutorial (04) - hive data typesHive tutorial (05) ...

Posted by Jackanape on Tue, 22 Feb 2022 04:24:36 +0100

Understanding its core concepts from hudi persistence files

[overview] This is the first article in the hudi series, which first deepens the understanding of the concept from the core concept and the stored file format, and then gradually shares the use (spark/flink into hudi, hudi synchronous hive, etc.) and principles (compression mechanism, index, clustering, etc.) [what is a d ...

Posted by QbertsBrother on Sun, 20 Feb 2022 05:10:02 +0100

Hadoop pseudo distributed cluster installation and deployment

deploy Download installation packageUpload and unzip the installation packageConfigure environment variablesModify profileFormat HDFSModify script fileStart and verifyStop cluster be careful: 1. The JDK environment is installed and configured by default 2. The CentOS7 Linux environment is installed and configured by default 3. Change the curre ...

Posted by MasterACE14 on Sat, 19 Feb 2022 21:33:02 +0100

MapReduce processing pictures

Reference link 1 Reference link 2 The code comes from link 2 and has been modified by yourself. The level is limited. I hope to point out some mistakes. hadoop3. 2.1 write code under centos 7 window, package and submit it to Hadoop cluster on centos for operation.   ideas:   put the picture on hdfs, and then write the path of each im ...

Posted by benyhanna on Fri, 18 Feb 2022 06:16:31 +0100

Using docker to build Hadoop cluster

I Environmental Science: 1.Ubuntu20 2.Hadoop3.1.4 3.Jdk1.8_301 II Specific steps Pull the latest version of ubuntu imageUse mount to transfer jdk,hadoop and other installation packages to the mount directory through xftp or using the command line scp command.Enter the ubuntu image container docker exec -it container id /bin/bashUpdate a ...

Posted by The_Assistant on Wed, 16 Feb 2022 14:32:26 +0100

Big data Hadoop installation and configuration

Big data, Spark, Hadoop, python Big data Hadoop installation and configuration 1, Hadoop pseudo distributed configuration 1. Create Hadoop user: sudo useradd -m hadoop -s /bin/bash # Create hadoop user sudo passwd hadoop # Change Password sudo adduser hadoop sudo # Add administrator privileges Log out and log in ...

Posted by jamesh on Sat, 12 Feb 2022 04:41:59 +0100

Distributed computing framework Map/reduce

Introduction: MapReduce is a cluster based high-performance parallel computing platform. MapReduce is a software framework for parallel computing and operation. MapReduce is a parallel programming model and methodcharacteristic: ① The distribution is reliable. The operation of the data set is distributed to multiple nodes in the cluster to ac ...

Posted by warren on Thu, 10 Feb 2022 19:39:51 +0100