Big Data [Page 9] - Programmer Think - where programmers share thinking

Big Data

Flink (56): FlinkSQL integration Hive of Flink advanced features

catalogue 0. Links to related articles 1. Introduction to flinksql integration Hive 2. Basic ways of integrating Hive 2.1. Persistent metadata 2.2. Use Flink to read and write Hive's table 3. Preparation 4. SQL CLI 5. Code demonstration 0. Links to related articles Flink article summary 1. Introduction to flinksql integration Hive ...

Posted by sd9sd on Wed, 02 Feb 2022 20:39:17 +0100

An ordered set of redis. Do you know how the bottom layer is implemented? After reading this article, you'll see through

self-introduction xdm well, I'm a jump table, an ordered data structure. There is an array in each of my nodes. I maintain multiple pointers that can point to other nodes, so I can quickly access these nodes. That's why I'm a jump table. My average time complexity is O(logN). At worst, it is O(N). In most cases, I can compare with the efficie ...

Posted by kb0000 on Wed, 02 Feb 2022 18:52:10 +0100

Python+API: a perfect match for reading public data

This article is translated from Python & APIs: A Winning Combo for Reading Public Data Article code address https://github.com/realpython/materials/tree/master/consuming-apis-python?__s=kea6w26ii09uqhijmy0b Python+API: a perfect match for reading public data go to top Learning to use different APIs is a magical skill, and many applica ...

Posted by p.persia69 on Wed, 02 Feb 2022 12:13:37 +0100

Migrating CM and database-2

9.1 migrate the data of the original CM node to the new node 9.1.1 backup the original CM node data It mainly backs up the monitoring data and management information of CM. The data directory includes: /var/lib/cloudera-host-monitor /var/lib/cloudera-service-monitor /var/lib/cloudera-scm-server /var/lib/cloudera-scm-eventserver /var/lib/clou ...

Posted by cidesign on Wed, 02 Feb 2022 07:50:51 +0100

Implementation of e-commerce website user operation analysis using Hive tool in Hadoop environment

1, Analysis target 1. Establish daily operation index system; 2. Analyze the composition of existing users and count the composition of daily users; 3. Analyze the re purchase situation of users and guide the subsequent re purchase operation. 2, Data description User_info user information table: user_action user behavior table: 3, Imp ...

Posted by gljaber on Wed, 02 Feb 2022 06:13:48 +0100

spark the way of God - detailed explanation of RDD creation

3.2 RDD programming In Spark, RDD is represented as an object, and RDD is converted through method calls on the object. After defining RDD through a series of transformations, you can call actions to trigger the calculation of RDD. Actions can be to return results to the application (count, collect, etc.) or to save data to the storage system ...

Posted by Matty999555 on Tue, 01 Feb 2022 16:47:04 +0100

Object oriented part of java big data development (package 1)

1. Classes and objects 1.1 what is the object? Everything is an object, and everything that exists objectively is an object 1.2 what is object-oriented? The action of paying attention to objects is called object-oriented For example, I'm going to the supermarket to buy fruit. Fruit is an object. I pay attention to its type, size, acidity a ...

Posted by fifin04 on Mon, 31 Jan 2022 17:27:15 +0100

[how to become a master of SQL] level 4: integrity constraints

👨‍🎓 Blogger introduction: IT Bond, a Jianghu person jeames007，10 year DBA hands-on background China DBA union(ACDU)Member, currently engaged in DBA And program programming SQL is almost a necessary skill for the production and research position of Internet companies, but if you only know SQL, you can't do anything. 1. If you are a da ...

Posted by raffielim on Mon, 31 Jan 2022 16:04:42 +0100

HiveSql&SparkSql -- use left semi join to optimize subqueries of in and exists types

Introduction to LEFT SEMI JOIN The main use scenario of SEMI JOIN (equivalent to LEFT SEMI JOIN) is to solve EXISTS IN. LEFT SEMI JOIN is a more efficient implementation of IN/EXISTS sub query. Although LEFT SEMI JOIN contains LEFT, its implementation effect is equivalent to INNER JOIN, but the JOIN result only takes the columns in the orig ...

Posted by Waldir on Mon, 31 Jan 2022 15:40:32 +0100

DataSkew -- Summary of data skew problem analysis and solution practice

Note that we should distinguish between data skew and excessive data. Data skew means that a few tasks are assigned most of the data, so a few tasks run slowly; Excessive data means that the amount of data allocated to all tasks is very large, the difference is not much, and all tasks run slowly. What is data skew In short, data s ...

Posted by n00b Saibot on Mon, 31 Jan 2022 15:19:02 +0100

Hot Topics