Big data Hadoop 3 1.3 detailed introduction notes of HDFS
On the right side of the page, there is a directory index, which can jump to the content you want to see according to the titleIf not on the right, look for the left
Main article link https://blog.csdn.net/grd_java/article/details/115639179Chapter I: environmental construction https://blog.csdn.net/grd_java/article/details/115693312
If you hav ...
Posted by algarve4me on Sat, 05 Mar 2022 05:22:59 +0100
Hadoop 07: introduction to SecondaryNameNode and DataNode
1, Introduction to SecondaryNameNode
When analyzing the edits log files just now, we have introduced the SecondaryNameNode. Here is a summary to show our attention.
The secondary namenode is mainly responsible for regularly merging the contents of the edits file into the fsimage
This merge operation is called checkpoint. When merging, the co ...
Posted by abselect on Wed, 02 Mar 2022 01:47:02 +0100
HDFS high availability architecture
First, we need to build three virtual machines (here is a demonstration). For the construction process, please refer to this article.
Using virtual machine to complete Hadoop fully distributed construction_ You can read your own blog - CSDN blog
After completing the Hadoop fully distributed construction in the previous article, you can do the ...
Posted by antileon on Sat, 26 Feb 2022 14:10:24 +0100
Hive tutorial (06) - Hive SerDe serialization and deserialization
01 introduction
In the previous tutorial, you have a preliminary understanding of Hive's data model, data types and operation commands. Interested students can refer to:
Hive tutorial (01) - getting to know hiveHive tutorial (02) - hive installationHive tutorial (03) - hive data modelHive tutorial (04) - hive data typesHive tutorial (05) ...
Posted by Jackanape on Tue, 22 Feb 2022 04:24:36 +0100
Understanding its core concepts from hudi persistence files
[overview]
This is the first article in the hudi series, which first deepens the understanding of the concept from the core concept and the stored file format, and then gradually shares the use (spark/flink into hudi, hudi synchronous hive, etc.) and principles (compression mechanism, index, clustering, etc.)
[what is a d ...
Posted by QbertsBrother on Sun, 20 Feb 2022 05:10:02 +0100
Hadoop pseudo distributed cluster installation and deployment
deploy
Download installation packageUpload and unzip the installation packageConfigure environment variablesModify profileFormat HDFSModify script fileStart and verifyStop cluster be careful: 1. The JDK environment is installed and configured by default 2. The CentOS7 Linux environment is installed and configured by default 3. Change the curre ...
Posted by MasterACE14 on Sat, 19 Feb 2022 21:33:02 +0100
MapReduce processing pictures
Reference link 1 Reference link 2 The code comes from link 2 and has been modified by yourself. The level is limited. I hope to point out some mistakes.
hadoop3. 2.1 write code under centos 7 window, package and submit it to Hadoop cluster on centos for operation. ideas: put the picture on hdfs, and then write the path of each im ...
Posted by benyhanna on Fri, 18 Feb 2022 06:16:31 +0100
Using docker to build Hadoop cluster
I Environmental Science:
1.Ubuntu20 2.Hadoop3.1.4 3.Jdk1.8_301
II Specific steps
Pull the latest version of ubuntu imageUse mount to transfer jdk,hadoop and other installation packages to the mount directory through xftp or using the command line scp command.Enter the ubuntu image container docker exec -it container id /bin/bashUpdate a ...
Posted by The_Assistant on Wed, 16 Feb 2022 14:32:26 +0100
Big data Hadoop installation and configuration
Big data, Spark, Hadoop, python
Big data Hadoop installation and configuration
1, Hadoop pseudo distributed configuration
1. Create Hadoop user:
sudo useradd -m hadoop -s /bin/bash # Create hadoop user
sudo passwd hadoop # Change Password
sudo adduser hadoop sudo # Add administrator privileges
Log out and log in ...
Posted by jamesh on Sat, 12 Feb 2022 04:41:59 +0100
Distributed computing framework Map/reduce
Introduction:
MapReduce is a cluster based high-performance parallel computing platform. MapReduce is a software framework for parallel computing and operation. MapReduce is a parallel programming model and methodcharacteristic:
① The distribution is reliable. The operation of the data set is distributed to multiple nodes in the cluster to ac ...
Posted by warren on Thu, 10 Feb 2022 19:39:51 +0100