No one has to help you. Everything has to be done by yourself
Hadoop source code compilation
(1) CentOS networking
Configure CentOS to connect to the Internet. Linux virtual machine ping is smooth
Note: use root role compilation to reduce the folder permissions
(2) jar package preparation (hadoop source code, JDK8, maven, ant, p ...
Posted by henryblake1979 on Tue, 30 Jun 2020 10:51:58 +0200
Hadoop configuration (non-HA)
Startup and Validation
Hadoop configuration (non-HA)
Hadoop is a distributed, highly available batch processing framework. Hadoop for CDH comes with other components such as Hbase, H ...
Posted by kwilameiya on Tue, 23 Jun 2020 03:53:38 +0200
This note is written by myself with reference to Lin Ziyu's teaching documents. Please refer to the database Laboratory of Xiamen University for details
Personal built hadoop platform practical environment: Ubuntu 14.04 64 bit * 3, JDK1.8, Hadoop 2.6.5 (apache)
1, Hadoop preparation before instal ...
Posted by caspert_ghost on Sun, 21 Jun 2020 11:11:30 +0200
AI: Keras PyTorch MXNet TensorFlow PaddlePaddle deep learning practice (updated from time to time)
4.4 real time log analysis
Master the connection between Flume and Kafka
We have collected the log data into hadoop, but when doing real-time ana ...
Posted by pontiac007 on Thu, 18 Jun 2020 06:33:47 +0200
**This paper introduces the use of the Java & shell API of HBase RowFilter in detail, and posts the relevant sample code for reference. RowFilter filters based on row keys. When it comes to data filtering through HBase Rowkey, you can consider using it. For details and principle of comparator, please refer to the previous revision: Comparat ...
Posted by jonners on Tue, 05 May 2020 08:00:28 +0200
A tool for translating sql statements into mapreduce programs.
Create table statement
CREATE TABLE page_view(viewTime INT, userid BIGINT,
page_url STRING, referrer_url STRING,
ip STRING COMMENT 'IP Address of the User')
COMMENT 'This is the page view table'
PARTITIONED BY(dt STRING, country STRING)
ROW FORMAT DELIMI ...
Posted by mainewoods on Wed, 29 Apr 2020 14:54:07 +0200
Reference article: apache Impala detailed installation (lying in the most complete pit)
Apache impala detailed installation
impala is an efficient sql query tool provided by cloudera, which provides real-time query results. The official test performance is 10 to 100 times faster than hive, and its sql query is even faster than spark sql. imp ...
Posted by deth4uall on Tue, 21 Apr 2020 09:18:04 +0200
hadoop version is 2.8.3
Today, I found a strange problem, as shown in List-1 below, indicating that two file blocks are missing
There are 2 missing blocks. The following files may be corrupted:
blk_1073857295 /tmp/xxx/b9a11fe8-306a-42cc-b49f-2a7f0 ...
Posted by tomfmason on Wed, 25 Mar 2020 15:31:06 +0100
Hadoop/HDFS/MapReduce/Impala is designed to store and process a large number of files, such as terabytes or petabytes of data. A large number of small files have a great impact on the query performance, because NameNode needs to save a large number of HDFS file metadata. If it queries many partitions or files at one time, it needs to obtain the ...
Posted by shane0714 on Sat, 21 Mar 2020 10:54:37 +0100