2021-5-5 101701Hive command
The first chapter uses Hive to start Hadoop cluster
Start HDFS HA
(check whether the 50070 web page port is started successfully)
Each machine starts the zookeeper service
zkServer.sh start
Start HDFS service
sbin/start-dfs.sh
Start YARN HA
(check whether the 8088 web page port is started successfully)
Start YARN
sbin/start-yar ...
Posted by mohabitar on Fri, 18 Feb 2022 15:01:55 +0100
SQL ability practice of deep love
@SQL skill improvement
MySQL index
At present, I'm a junior. I want to be engaged in data warehouse development, but the ability of data development is also particularly high in the market,
I also interviewed many small companies and agreed that I didn't have a deeper understanding of the workflow and project expansion of the whole project
A ...
Posted by markmil2002 on Thu, 17 Feb 2022 12:54:18 +0100
Detailed explanation of ORC file structure
The overall structure of ORC documents is as follows:
The searching and indexing of data in orc file structure is essentially three-level filtering: file level, Stripe level and Row Group level. In this way, the data actually to be scanned and read can be reduced to part of the RowGroup of part of the Stripe without scanning the whole file. ...
Posted by Sander on Mon, 14 Feb 2022 13:20:42 +0100
Community developer column | linkis1 by Maria Carrie 0.2 installation and use guide
Original article publishing address: https://www.jianshu.com/p/d0e8b605c4ce
Click "read the original text" or visit https://linkis.apache.org/#/ Learn more about Apache links
Community developer: Maria Carrie
GitHub : mindflow94
This article is mainly used to guide users to install and deploy Linkis and datasphere studio, and t ...
Posted by BarmyArmy on Wed, 09 Feb 2022 08:12:25 +0100
Flink (56): FlinkSQL integration Hive of Flink advanced features
catalogue
0. Links to related articles
1. Introduction to flinksql integration Hive
2. Basic ways of integrating Hive
2.1. Persistent metadata
2.2. Use Flink to read and write Hive's table
3. Preparation
4. SQL CLI
5. Code demonstration
0. Links to related articles
Flink article summary
1. Introduction to flinksql integration Hive ...
Posted by sd9sd on Wed, 02 Feb 2022 20:39:17 +0100
Implementation of e-commerce website user operation analysis using Hive tool in Hadoop environment
1, Analysis target
1. Establish daily operation index system; 2. Analyze the composition of existing users and count the composition of daily users; 3. Analyze the re purchase situation of users and guide the subsequent re purchase operation.
2, Data description
User_info user information table: user_action user behavior table:
3, Imp ...
Posted by gljaber on Wed, 02 Feb 2022 06:13:48 +0100
HiveSql&SparkSql -- use left semi join to optimize subqueries of in and exists types
Introduction to LEFT SEMI JOIN
The main use scenario of SEMI JOIN (equivalent to LEFT SEMI JOIN) is to solve EXISTS IN. LEFT SEMI JOIN is a more efficient implementation of IN/EXISTS sub query. Although LEFT SEMI JOIN contains LEFT, its implementation effect is equivalent to INNER JOIN, but the JOIN result only takes the columns in the orig ...
Posted by Waldir on Mon, 31 Jan 2022 15:40:32 +0100
DataSkew -- Summary of data skew problem analysis and solution practice
Note that we should distinguish between data skew and excessive data. Data skew means that a few tasks are assigned most of the data, so a few tasks run slowly; Excessive data means that the amount of data allocated to all tasks is very large, the difference is not much, and all tasks run slowly.
What is data skew
In short, data s ...
Posted by n00b Saibot on Mon, 31 Jan 2022 15:19:02 +0100
Big data Hive in 2021: teach you how to understand Hive database and table operation (learn to count in seconds)
The most detailed Hive article series in the whole network, it is strongly recommended to collect and pay attention!
Later, the updated articles will list the catalogue of historical articles to help you review the key points of knowledge.
catalogue
Series of historical articles
preface
Hive database and table operations
1, Database op ...
Posted by bbaker on Mon, 31 Jan 2022 04:41:33 +0100
UDF is used in Hive and Impala
11.1 introduction to experimental environment
The cluster environment is running normallyHive and Impala services are installed in the clusterOperating system: redhat6 fiveCDH and CM versions are 5.11.1EC2 user with sudo permission is used for operation
11.2 UDF function development - using Intellij tools
Use Intellij tool to develop Hive's ...
Posted by Design on Sun, 30 Jan 2022 23:08:38 +0100