2021-5-5 101701Hive command

The first chapter uses Hive to start Hadoop cluster Start HDFS HA (check whether the 50070 web page port is started successfully) Each machine starts the zookeeper service zkServer.sh start Start HDFS service sbin/start-dfs.sh Start YARN HA (check whether the 8088 web page port is started successfully) Start YARN sbin/start-yar ...

Posted by mohabitar on Fri, 18 Feb 2022 15:01:55 +0100

SQL ability practice of deep love

@SQL skill improvement MySQL index At present, I'm a junior. I want to be engaged in data warehouse development, but the ability of data development is also particularly high in the market, I also interviewed many small companies and agreed that I didn't have a deeper understanding of the workflow and project expansion of the whole project A ...

Posted by markmil2002 on Thu, 17 Feb 2022 12:54:18 +0100

Detailed explanation of ORC file structure

The overall structure of ORC documents is as follows: The searching and indexing of data in orc file structure is essentially three-level filtering: file level, Stripe level and Row Group level. In this way, the data actually to be scanned and read can be reduced to part of the RowGroup of part of the Stripe without scanning the whole file. ...

Posted by Sander on Mon, 14 Feb 2022 13:20:42 +0100

Community developer column | linkis1 by Maria Carrie 0.2 installation and use guide

Original article publishing address: https://www.jianshu.com/p/d0e8b605c4ce Click "read the original text" or visit https://linkis.apache.org/#/ Learn more about Apache links Community developer: Maria Carrie GitHub : mindflow94   This article is mainly used to guide users to install and deploy Linkis and datasphere studio, and t ...

Posted by BarmyArmy on Wed, 09 Feb 2022 08:12:25 +0100

Flink (56): FlinkSQL integration Hive of Flink advanced features

catalogue 0. Links to related articles 1. Introduction to flinksql integration Hive 2. Basic ways of integrating Hive 2.1. Persistent metadata 2.2. Use Flink to read and write Hive's table 3. Preparation 4. SQL CLI 5. Code demonstration 0. Links to related articles Flink article summary 1. Introduction to flinksql integration Hive ...

Posted by sd9sd on Wed, 02 Feb 2022 20:39:17 +0100

Implementation of e-commerce website user operation analysis using Hive tool in Hadoop environment

1, Analysis target 1. Establish daily operation index system; 2. Analyze the composition of existing users and count the composition of daily users; 3. Analyze the re purchase situation of users and guide the subsequent re purchase operation. 2, Data description User_info user information table: user_action user behavior table: 3, Imp ...

Posted by gljaber on Wed, 02 Feb 2022 06:13:48 +0100

HiveSql&SparkSql -- use left semi join to optimize subqueries of in and exists types

Introduction to LEFT SEMI JOIN The main use scenario of SEMI JOIN (equivalent to LEFT SEMI JOIN) is to solve EXISTS IN. LEFT SEMI JOIN is a more efficient implementation of IN/EXISTS sub query. Although LEFT SEMI JOIN contains LEFT, its implementation effect is equivalent to INNER JOIN, but the JOIN result only takes the columns in the orig ...

Posted by Waldir on Mon, 31 Jan 2022 15:40:32 +0100

DataSkew -- Summary of data skew problem analysis and solution practice

Note that we should distinguish between data skew and excessive data. Data skew means that a few tasks are assigned most of the data, so a few tasks run slowly; Excessive data means that the amount of data allocated to all tasks is very large, the difference is not much, and all tasks run slowly. What is data skew In short, data s ...

Posted by n00b Saibot on Mon, 31 Jan 2022 15:19:02 +0100

Big data Hive in 2021: teach you how to understand Hive database and table operation (learn to count in seconds)

The most detailed Hive article series in the whole network, it is strongly recommended to collect and pay attention! Later, the updated articles will list the catalogue of historical articles to help you review the key points of knowledge. catalogue Series of historical articles preface Hive database and table operations 1, Database op ...

Posted by bbaker on Mon, 31 Jan 2022 04:41:33 +0100

UDF is used in Hive and Impala

11.1 introduction to experimental environment The cluster environment is running normallyHive and Impala services are installed in the clusterOperating system: redhat6 fiveCDH and CM versions are 5.11.1EC2 user with sudo permission is used for operation 11.2 UDF function development - using Intellij tools Use Intellij tool to develop Hive's ...

Posted by Design on Sun, 30 Jan 2022 23:08:38 +0100