Apache Hadoop deployment: HDFS/YARN/MR configuration

Catalog Hadoop configuration (non-HA) To configure Hadoop-env.sh Hdfs Yarn MapReduce Worker File Startup and Validation problem Hadoop configuration (non-HA) Hadoop is a distributed, highly available batch processing framework. Hadoop for CDH comes with other components such as Hbase, H ...

Posted by kwilameiya on Tue, 23 Jun 2020 03:53:38 +0200

Building of Hadoop distributed environment

This note is written by myself with reference to Lin Ziyu's teaching documents. Please refer to the database Laboratory of Xiamen University for details Personal built hadoop platform practical environment: Ubuntu 14.04 64 bit * 3, JDK1.8, Hadoop 2.6.5 (apache) 1, Hadoop preparation before instal ...

Posted by caspert_ghost on Sun, 21 Jun 2020 11:11:30 +0200

Real time log analysis

Germeng Society AI: Keras PyTorch MXNet TensorFlow PaddlePaddle deep learning practice (updated from time to time) 4.4 real time log analysis Learning objectives target Master the connection between Flume and Kafka We have collected the log data into hadoop, but when doing real-time ana ...

Posted by pontiac007 on Thu, 18 Jun 2020 06:33:47 +0200

Detailed explanation of RowFilter of HBase Filter

**This paper introduces the use of the Java & shell API of HBase RowFilter in detail, and posts the relevant sample code for reference. RowFilter filters based on row keys. When it comes to data filtering through HBase Rowkey, you can consider using it. For details and principle of comparator, please refer to the previous revision: Comparat ...

Posted by jonners on Tue, 05 May 2020 08:00:28 +0200

Hadoop 8-day course - day 6, MapReduce details

hive About hive A tool for translating sql statements into mapreduce programs. Create table statement CREATE TABLE page_view(viewTime INT, userid BIGINT, page_url STRING, referrer_url STRING, ip STRING COMMENT 'IP Address of the User') COMMENT 'This is the page view table' PARTITIONED BY(dt STRING, country STRING) ROW FORMAT DELIMI ...

Posted by mainewoods on Wed, 29 Apr 2020 14:54:07 +0200

apache Impala detailed installation

Reference article: apache Impala detailed installation (lying in the most complete pit) Apache impala detailed installation impala is an efficient sql query tool provided by cloudera, which provides real-time query results. The official test performance is 10 to 100 times faster than hive, and its sql query is even faster than spark sql. imp ...

Posted by deth4uall on Tue, 21 Apr 2020 09:18:04 +0200

Source code interpretation - (3) HBase examples multithreadedclientexample

Address: http://aperise.iteye.com/blog/2372534 Source code interpretation - (1)hbase client source code http://aperise.iteye.com/blog/2372350 Source code interpretation - (2) HBase examples bufferedmutator example http://aperise.iteye.com/blog/2372505 Source code interpretation - (3) HBase examples multithreadedc ...

Posted by point86 on Thu, 02 Apr 2020 02:58:55 +0200

check the logs or run fsck in order to identify the missing blocks

hadoop version is 2.8.3 Today, I found a strange problem, as shown in List-1 below, indicating that two file blocks are missing     List-1 There are 2 missing blocks. The following files may be corrupted: blk_1073857294 /tmp/xxx/b9a11fe8-306a-42cc-b49f-2a7f098ecb5a/hive-exec-2.1.1.jar blk_1073857295 /tmp/xxx/b9a11fe8-306a-42cc-b49f-2a7f0 ...

Posted by tomfmason on Wed, 25 Mar 2020 15:31:06 +0100

The impact of large compressed files on the query performance of Impala

Hadoop/HDFS/MapReduce/Impala is designed to store and process a large number of files, such as terabytes or petabytes of data. A large number of small files have a great impact on the query performance, because NameNode needs to save a large number of HDFS file metadata. If it queries many partitions or files at one time, it needs to obtain the ...

Posted by shane0714 on Sat, 21 Mar 2020 10:54:37 +0100

Problems encountered in building large data infrastructure components (integration)

Article Directory Hadoop 1. because hostname cannot be resolved 1.1 Cause 1.2 Solution 1.3 Reference Hive 1. Cannot find hadoop installation: $HADOOP_HOME or $HADOOP_PREFIX must be set or hadoop must be in the path 1.1 Solution 2. Unable to instantiate org.a ...

Posted by a6000000 on Tue, 17 Mar 2020 02:31:41 +0100