Apache Hadoop deployment: HDFS/YARN/MR configuration
Catalog
Hadoop configuration (non-HA)
To configure
Hadoop-env.sh
Hdfs
Yarn
MapReduce
Worker File
Startup and Validation
problem
Hadoop configuration (non-HA)
Hadoop is a distributed, highly available batch processing framework. Hadoop for CDH comes with other components such as Hbase, H ...
Posted by kwilameiya on Tue, 23 Jun 2020 03:53:38 +0200
Building of Hadoop distributed environment
This note is written by myself with reference to Lin Ziyu's teaching documents. Please refer to the database Laboratory of Xiamen University for details
Personal built hadoop platform practical environment: Ubuntu 14.04 64 bit * 3, JDK1.8, Hadoop 2.6.5 (apache)
1, Hadoop preparation before instal ...
Posted by caspert_ghost on Sun, 21 Jun 2020 11:11:30 +0200
Real time log analysis
Germeng Society
AI: Keras PyTorch MXNet TensorFlow PaddlePaddle deep learning practice (updated from time to time)
4.4 real time log analysis
Learning objectives
target
Master the connection between Flume and Kafka
We have collected the log data into hadoop, but when doing real-time ana ...
Posted by pontiac007 on Thu, 18 Jun 2020 06:33:47 +0200
Detailed explanation of RowFilter of HBase Filter
**This paper introduces the use of the Java & shell API of HBase RowFilter in detail, and posts the relevant sample code for reference. RowFilter filters based on row keys. When it comes to data filtering through HBase Rowkey, you can consider using it. For details and principle of comparator, please refer to the previous revision: Comparat ...
Posted by jonners on Tue, 05 May 2020 08:00:28 +0200
Hadoop 8-day course - day 6, MapReduce details
hive
About hive
A tool for translating sql statements into mapreduce programs.
Create table statement
CREATE TABLE page_view(viewTime INT, userid BIGINT,
page_url STRING, referrer_url STRING,
ip STRING COMMENT 'IP Address of the User')
COMMENT 'This is the page view table'
PARTITIONED BY(dt STRING, country STRING)
ROW FORMAT DELIMI ...
Posted by mainewoods on Wed, 29 Apr 2020 14:54:07 +0200
apache Impala detailed installation
Reference article: apache Impala detailed installation (lying in the most complete pit)
Apache impala detailed installation
impala is an efficient sql query tool provided by cloudera, which provides real-time query results. The official test performance is 10 to 100 times faster than hive, and its sql query is even faster than spark sql. imp ...
Posted by deth4uall on Tue, 21 Apr 2020 09:18:04 +0200
Source code interpretation - (3) HBase examples multithreadedclientexample
Address: http://aperise.iteye.com/blog/2372534
Source code interpretation - (1)hbase client source code
http://aperise.iteye.com/blog/2372350
Source code interpretation - (2) HBase examples bufferedmutator example
http://aperise.iteye.com/blog/2372505
Source code interpretation - (3) HBase examples multithreadedc ...
Posted by point86 on Thu, 02 Apr 2020 02:58:55 +0200
check the logs or run fsck in order to identify the missing blocks
hadoop version is 2.8.3
Today, I found a strange problem, as shown in List-1 below, indicating that two file blocks are missing
List-1
There are 2 missing blocks. The following files may be corrupted:
blk_1073857294 /tmp/xxx/b9a11fe8-306a-42cc-b49f-2a7f098ecb5a/hive-exec-2.1.1.jar
blk_1073857295 /tmp/xxx/b9a11fe8-306a-42cc-b49f-2a7f0 ...
Posted by tomfmason on Wed, 25 Mar 2020 15:31:06 +0100
The impact of large compressed files on the query performance of Impala
Hadoop/HDFS/MapReduce/Impala is designed to store and process a large number of files, such as terabytes or petabytes of data. A large number of small files have a great impact on the query performance, because NameNode needs to save a large number of HDFS file metadata. If it queries many partitions or files at one time, it needs to obtain the ...
Posted by shane0714 on Sat, 21 Mar 2020 10:54:37 +0100
Problems encountered in building large data infrastructure components (integration)
Article Directory
Hadoop
1. because hostname cannot be resolved
1.1 Cause
1.2 Solution
1.3 Reference
Hive
1. Cannot find hadoop installation: $HADOOP_HOME or $HADOOP_PREFIX must be set or hadoop must be in the path
1.1 Solution
2. Unable to instantiate org.a ...
Posted by a6000000 on Tue, 17 Mar 2020 02:31:41 +0100