Flink (56): FlinkSQL integration Hive of Flink advanced features
catalogue
0. Links to related articles
1. Introduction to flinksql integration Hive
2. Basic ways of integrating Hive
2.1. Persistent metadata
2.2. Use Flink to read and write Hive's table
3. Preparation
4. SQL CLI
5. Code demonstration
0. Links to related articles
Flink article summary
1. Introduction to flinksql integration Hive ...
Posted by sd9sd on Wed, 02 Feb 2022 20:39:17 +0100
An ordered set of redis. Do you know how the bottom layer is implemented? After reading this article, you'll see through
self-introduction
xdm well, I'm a jump table, an ordered data structure. There is an array in each of my nodes. I maintain multiple pointers that can point to other nodes, so I can quickly access these nodes. That's why I'm a jump table.
My average time complexity is O(logN). At worst, it is O(N). In most cases, I can compare with the efficie ...
Posted by kb0000 on Wed, 02 Feb 2022 18:52:10 +0100
Python+API: a perfect match for reading public data
This article is translated from Python & APIs: A Winning Combo for Reading Public Data
Article code address https://github.com/realpython/materials/tree/master/consuming-apis-python?__s=kea6w26ii09uqhijmy0b
Python+API: a perfect match for reading public data
go to top Learning to use different APIs is a magical skill, and many applica ...
Posted by p.persia69 on Wed, 02 Feb 2022 12:13:37 +0100
Migrating CM and database-2
9.1 migrate the data of the original CM node to the new node
9.1.1 backup the original CM node data
It mainly backs up the monitoring data and management information of CM. The data directory includes:
/var/lib/cloudera-host-monitor
/var/lib/cloudera-service-monitor
/var/lib/cloudera-scm-server
/var/lib/cloudera-scm-eventserver
/var/lib/clou ...
Posted by cidesign on Wed, 02 Feb 2022 07:50:51 +0100
Implementation of e-commerce website user operation analysis using Hive tool in Hadoop environment
1, Analysis target
1. Establish daily operation index system; 2. Analyze the composition of existing users and count the composition of daily users; 3. Analyze the re purchase situation of users and guide the subsequent re purchase operation.
2, Data description
User_info user information table: user_action user behavior table:
3, Imp ...
Posted by gljaber on Wed, 02 Feb 2022 06:13:48 +0100
spark the way of God - detailed explanation of RDD creation
3.2 RDD programming
In Spark, RDD is represented as an object, and RDD is converted through method calls on the object. After defining RDD through a series of transformations, you can call actions to trigger the calculation of RDD. Actions can be to return results to the application (count, collect, etc.) or to save data to the storage system ...
Posted by Matty999555 on Tue, 01 Feb 2022 16:47:04 +0100
Object oriented part of java big data development (package 1)
1. Classes and objects
1.1 what is the object?
Everything is an object, and everything that exists objectively is an object
1.2 what is object-oriented?
The action of paying attention to objects is called object-oriented
For example, I'm going to the supermarket to buy fruit. Fruit is an object. I pay attention to its type, size, acidity a ...
Posted by fifin04 on Mon, 31 Jan 2022 17:27:15 +0100
[how to become a master of SQL] level 4: integrity constraints
👨🎓 Blogger introduction:
IT Bond, a Jianghu person jeames007,10 year DBA hands-on background
China DBA union(ACDU)Member, currently engaged in DBA And program programming
SQL is almost a necessary skill for the production and research position of Internet companies, but if you only know SQL, you can't do anything. 1. If you are a da ...
Posted by raffielim on Mon, 31 Jan 2022 16:04:42 +0100
HiveSql&SparkSql -- use left semi join to optimize subqueries of in and exists types
Introduction to LEFT SEMI JOIN
The main use scenario of SEMI JOIN (equivalent to LEFT SEMI JOIN) is to solve EXISTS IN. LEFT SEMI JOIN is a more efficient implementation of IN/EXISTS sub query. Although LEFT SEMI JOIN contains LEFT, its implementation effect is equivalent to INNER JOIN, but the JOIN result only takes the columns in the orig ...
Posted by Waldir on Mon, 31 Jan 2022 15:40:32 +0100
DataSkew -- Summary of data skew problem analysis and solution practice
Note that we should distinguish between data skew and excessive data. Data skew means that a few tasks are assigned most of the data, so a few tasks run slowly; Excessive data means that the amount of data allocated to all tasks is very large, the difference is not much, and all tasks run slowly.
What is data skew
In short, data s ...
Posted by n00b Saibot on Mon, 31 Jan 2022 15:19:02 +0100