Big Data [Page 4] - Programmer Think - where programmers share thinking

Big Data

Flink_ 06_ Processapi (personal summary)

Statement: 1 *** 2. Because it is a personal summary, write the article with the most concise words 3. If there is any mistake or improper place, please point out Side output ...

Posted by springo on Thu, 24 Feb 2022 06:54:29 +0100

Flink Sql With 1.14 query - Window Join

Window Join # Streaming media Window connection adds the time dimension to the connection standard itself. In doing so, the window join connects the elements of two streams that share a common key and reside in the same window. Semantics and of window connection The semantics of DataStream window connection are the same For streaming queries ...

Posted by ricroma on Thu, 24 Feb 2022 06:14:01 +0100

Business Data Diversion for Flink Real-Time Projects

Python WeChat Subscription Applet Course Video https://edu.csdn.net/course/detail/36074 Python Actual Quantitative Transaction Finance System https://edu.csdn.net/course/detail/35475 In the previous article, we have obtained the output stream of business data, which is the output stream of dim layer dimension data and the output stream of dw ...

Posted by apervizi on Mon, 21 Feb 2022 18:40:24 +0100

Hudi of data Lake: Hudi quick experience

catalogue 0. Links to related articles 1. Compile Hudi source code 1.1. Maven installation 1.2. Download and compile hudi 2. Install HDFS 3. Install Spark 4. Run hudi program in spark shell It mainly introduces the integrated use of Apache native Hudi, HDFS, Spark, etc 0. Links to related articles Summary of articles on basic know ...

Posted by mrman23 on Mon, 21 Feb 2022 06:05:15 +0100

Create Spark project operation table: Kudu

Spark operation Kudu creates a table Spark and KUDU integration support: DDL operation (create / delete)Local Kudu RDDNative Kudu data source for DataFrame integrationRead data from kuduInsert / update / upsert / delete from KuduPredicate push downSchema mapping between Kudu and Spark SQLSo far, we have heard of several contexts, such as Sp ...

Posted by mynameisbob on Sun, 20 Feb 2022 21:51:03 +0100

The 26th day of learning big data - supplementing IO stream and multithreading

Day 26 of learning big data - Supplementary IO stream and multithreading (1) Summary of multithreading: 1, Overview of multithreading: Process: The running program is an independent unit of the system for resource allocation. Each process has its own memory space and resources Thread: It is a single sequential control flow of a process, ...

Posted by villas on Sun, 20 Feb 2022 19:16:41 +0100

Elastic search: create a cluster with multiple nodes - Elastic Stack 8.0

In my previous article: Elastic Stack 8.0 installation - protecting your Elastic Stack is now easier than ever Elastic: use Docker to install Elastic Stack 8.0 and start using it I described in detail how to install a single node Elastic Stack. In today's tutorial, I detail how to install a three node Elasticsearch cluster. I will use Docke ...

Posted by chadbobb on Sun, 20 Feb 2022 08:09:52 +0100

Hive installation, deployment and management

Hive installation, deployment and management Experimental environment Linux Ubuntu 16.04 prerequisite: 1) Java runtime environment deployment completed 2) Hadoop 3.0.0 single point deployment completed 3) MySQL database installation completed The above preconditions are ready for you. Experimental content Under the above preconditions, ...

Posted by Homer30 on Sun, 20 Feb 2022 04:37:39 +0100

Flume introduction and flume deployment, principle and use

Flume introduction and flume deployment, principle and use Flume overview Flume is a highly available, reliable and distributed system for massive log collection, aggregation and transmission provided by Cloudera. Flume is based on streaming architecture, which is flexible and simple. Flume's main function is to read the data from the server ...

Posted by CONFUSIONUK on Sat, 19 Feb 2022 16:46:18 +0100

Principle and application of Hadoop Technology

Hadoop data processing (sophomore training in 2020) 1, Project background The training content is the statistical analysis of automobile sales data. Through this project, we will deepen our understanding of HDFS distributed file system and MapReduce distributed parallel computing framework, master and apply them skillfully, experience the dev ...

Posted by Garcia on Sat, 19 Feb 2022 11:54:35 +0100

Hot Topics