2. hdfs architecture

Posted by Nilanka on Mon, 14 Oct 2019 05:24:03 +0200

[TOC]

I. Overview of HDFS System Composition

This is a distributed file system that is suitable for scenarios where multiple reads are written at one time. Contains the following roles:

NameNode(nn): 
Store metadata of files, such as file name, file directory structure, file attributes and so on, as well as block list of each file and DataNode where the block is located. And responding to the client's reading and writing operations on hdfs, such as creating directories, uploading files, etc. There are also read and write logs.

DataNode(dn): 
Storing block data and checksum of block data in local file system

SecondaryNameNode(snn): 
Auxiliary daemon used to monitor the status of HDFS, which takes snapshots of HDFS metadata at intervals, is equivalent to backing up NameNode.

HDFS-NameNode

The main responsibility is to manage all nodes of hdfs.
1. Respond to the client's request for hdfs, such as add, delete and check.
2. Manage and maintain metadata information and logs of hdfs (non-log logs)

NameNode creates a directory in the hadoop.tmp.dir directory specified in core-site.xml: dfs/name/. Let's look at the structure of this directory.

[root@bigdata121 tmp]# tree dfs/name
dfs/name
├── current
│   ├── edits_0000000000000000001-0000000000000000002
│   ├── edits_0000000000000000003-0000000000000000004
│   ├── edits_0000000000000000005-0000000000000000006
│   ├── edits_0000000000000000007-0000000000000000008
│   ├── edits_0000000000000000009-0000000000000000009
│   ├── edits_0000000000000000010-0000000000000000011
│   ├── edits_0000000000000000012-0000000000000000013
│   ├── edits_0000000000000000014-0000000000000000015
│   ├── edits_0000000000000000016-0000000000000000017
│   ├── edits_0000000000000000018-0000000000000000019
│   ├── edits_0000000000000000020-0000000000000000021
│   ├── edits_0000000000000000022-0000000000000000024
│   ├── edits_0000000000000000025-0000000000000000026
│   ├── edits_inprogress_0000000000000000027
│   ├── fsimage_0000000000000000024
│   ├── fsimage_0000000000000000024.md5
│   ├── fsimage_0000000000000000026
│   ├── fsimage_0000000000000000026.md5
│   ├── seen_txid
│   └── VERSION
└── in_use.lock

The functions of each file directory are as follows:

1,current

It mainly stores meta-information and log of data stored in hdfs.

(1) edits file

It is a binary file, which records the information of adding and deleting operations on hdfs, similar to MySQL's binary log. The edits_inprogress_xxxx representation is the latest edits log and is currently in use.
You can use commands to view the contents of edits files:

//Format: HDFS oev-i input file-o output file (xml format)
[root@bigdata121 current]# hdfs oev -i edits_inprogress_0000000000000000038 -o /tmp/edits_inprogess.xml

[root@bigdata121 current]# cat /tmp/edits_inprogess.xml
<?xml version="1.0" encoding="UTF-8"?>
<EDITS>
  <EDITS_VERSION>-63</EDITS_VERSION>
  <RECORD>
    <OPCODE>OP_START_LOG_SEGMENT</OPCODE>   Represents the type of operation, here is the log start record
    <DATA>
      <TXID>38</TXID>  Operational-like ID，Is the only one.
    </DATA>
  </RECORD>
</EDITS>

<RECORD>
    <OPCODE>OP_ADD_BLOCK</OPCODE>     //This is the operation of uploading files.
    <DATA>
      <TXID>34</TXID>
      <PATH>/jdk-8u144-linux-x64.tar.gz._COPYING_</PATH>
      <BLOCK>
        <BLOCK_ID>1073741825</BLOCK_ID>
        <NUM_BYTES>134217728</NUM_BYTES>
        <GENSTAMP>1001</GENSTAMP>
      </BLOCK>
      <BLOCK>
        <BLOCK_ID>1073741826</BLOCK_ID>
        <NUM_BYTES>0</NUM_BYTES>
        <GENSTAMP>1002</GENSTAMP>
      </BLOCK>
      <RPC_CLIENTID></RPC_CLIENTID>
      <RPC_CALLID>-2</RPC_CALLID>
    </DATA>
  </RECORD>

(2) fsimage file

Metadata files of data in hdfs. Recording the information of each data block in the HDFS file system is not up to date. You need to merge edits files here regularly to be up-to-date. You can use commands to view the contents of the fsimage file:

//Format: hdfs oiv -p output format - i input file - o output file
[root@bigdata121 current]# hdfs oiv -p XML -i fsimage_0000000000000000037 -o /tmp/fsimage37.xml

[root@bigdata121 current]# cat /tmp/fsimage37.xml
<?xml version="1.0"?>
<fsimage><version><layoutVersion>-63</layoutVersion><onDiskVersion>1</onDiskVersion><oivRevision>17e75c2a11685af3e043aa5e604dc831e5b14674</oivRevision></version>
<NameSection><namespaceId>1780930535</namespaceId><genstampV1>1000</genstampV1><genstampV2>1002</genstampV2><genstampV1Limit>0</genstampV1Limit><lastAllocatedBlockId>1073741826</lastAllocatedBlockId><txid>37</txid></NameSection>
<INodeSection><lastInodeId>16387</lastInodeId><numInodes>3</numInodes><inode><id>16385</id><type>DIRECTORY</type><name></name><mtime>1558145602785</mtime><permission>root:supergroup:0755</permission><nsquota>9223372036854775807</nsquota><dsquota>-1</dsquota></inode>
<inode><id>16386</id><type>DIRECTORY</type><name>input</name><mtime>1558105166840</mtime><permission>root:supergroup:0755</permission><nsquota>-1</nsquota><dsquota>-1</dsquota></inode>
<inode><id>16387</id><type>FILE</type><name>jdk-8u144-linux-x64.tar.gz</name><replication>2</replication><mtime>1558145602753</mtime><atime>1558145588521</atime><preferredBlockSize>134217728</preferredBlockSize><permission>root:supergroup:0644</permission><blocks><block><id>1073741825</id><genstamp>1001</genstamp><numBytes>134217728</numBytes></block>
<block><id>1073741826</id><genstamp>1002</genstamp><numBytes>51298114</numBytes></block>
</blocks>
<storagePolicyId>0</storagePolicyId></inode>
</INodeSection>
<INodeReferenceSection></INodeReferenceSection><SnapshotSection><snapshotCounter>0</snapshotCounter><numSnapshots>0</numSnapshots></SnapshotSection>
<INodeDirectorySection><directory><parent>16385</parent><child>16386</child><child>16387</child></directory>
</INodeDirectorySection>
<FileUnderConstructionSection></FileUnderConstructionSection>
<SecretManagerSection><currentId>0</currentId><tokenSequenceNumber>0</tokenSequenceNumber><numDelegationKeys>0</numDelegationKeys><numTokens>0</numTokens></SecretManagerSection><CacheManagerSection><nextDirectiveId>1</nextDirectiveId><numDirectives>0</numDirectives><numPools>0</numPools></CacheManagerSection>
</fsimage>

More detailed information is recorded. The metadata of files, such as file permissions, timestamps, etc., are recorded.

(3)seen_txid

Txid is a concept similar to event id, which refers to an identifier for each operation. This file records the next of the latest txid, that is, the last txid is 37, so this file records 38.

(4) the relationship between fsimage and edit file naming

Edit file:
We can see that edits files are named in the form of edits_00000xx-000000xxx, which means that the scope of txid operation events is recorded in the edits file. Ed_inprogess_00000xxx represents the latest txid event currently recorded.

fsimage file:
Named as fsimage_000000xxx, it represents the latest txid event recorded by the fsimage file. Please note that the edits file will be merged into fsimage only after fsimage is conditionally triggered, otherwise it will not be merged. So in general, the txid behind the edits file is certainly larger than fsimage.

2,in_use.lock

This file mainly locks the current node to avoid the current machine starting multiple namenodes at the same time. Only one namenode can be started

III. HDFS datanode

The data node of HDFS is mainly the block file that stores data. A dfs/data directory will be created under the specified directory. Take a look at the directory structure:

[root@bigdata122 dfs]# tree data
data
├── current
│   ├── BP-1130553825-192.168.50.121-1557922928723
│   │   ├── current
│   │   │   ├── finalized
│   │   │   │   └── subdir0
│   │   │   │       └── subdir0
│   │   │   │           ├── blk_1073741825
│   │   │   │           ├── blk_1073741825_1001.meta
│   │   │   │           ├── blk_1073741826
│   │   │   │           └── blk_1073741826_1002.meta
│   │   │   ├── rbw
│   │   │   └── VERSION
│   │   ├── scanner.cursor
│   │   └── tmp
│   └── VERSION
└── in_use.lock

In HDFS, files are divided into several large data blocks for storage.
The default size of data blocks in hadoop 1.x is 64M
The default size of data blocks in hadoop 2.x is 128M
hadoop 3.x no longer uses multi-copy mode storage, but uses erasure code technology.
See you, please. https://www.cnblogs.com/basenet855x/p/7889994.html

The blk_xxxxx file in the above directory is actually a BLK file, each of which has a specified block size.

HDFS-Secondary NameNode

Auxiliary background program used to monitor HDFS status is mainly used to assist the work of NameNode node (not the standby node of NameNode), in which the main work is to merge edits files into fsimage files.
1. Merge edits files into fsimage according to the time interval of checkpoint (default 3600 seconds) or the trigger condition when the edits file reaches 64M.
2. After edits are merged into fsimage, the edits file can be emptied.

Topics: Big Data xml Hadoop JDK Linux

Programmer Think