Big Data [Page 28] - Programmer Think - where programmers share thinking

Big Data

Kafka Learning Notes: Producer Development in Java

In the previous article, we learned about specific terms in Kafka and related topic concepts, which you can refer to in more detail Kafka Learning Notes (2): Understanding Kafka Cluster and Topic Next we'll use the Java language to call Kafka's API. Today we'll first look at what we can do with Producer. Add Dependency First let's add maven d ...

Posted by KCAstroTech on Fri, 15 Oct 2021 18:58:29 +0200

Customize zones and sort within zones

Simple wordCount Suppose there are some data in our file: spark spark hive hadoop spark spark hive hadoop spark spark hive hadoop spark spark hive hadoop spark spark hive hadoop spark spark hive hadoop spark spark hive hadoop spark spark hive hadoop spark spark hive hadoop spark spark hive hadoop spark spark hive hadoop spark spark hive hadoo ...

Posted by dieselmachine on Fri, 15 Oct 2021 01:41:08 +0200

Pandas advanced tutorial: time processing

brief introduction Time should be a data type often used in data processing. In addition to the two data types of datetime64 and timedelta64 in Numpy, pandas also integrates the functions of other python libraries, such as scikits.timeseries. Time classification There are four time types in pandas: Date times: date and time, with time z ...

Posted by powlouk on Mon, 11 Oct 2021 02:24:05 +0200

Have you learned to build your own kafka mirror for development testing?

Preface Functional debugging is often done in collaboration with some software during development, for example, when working with Flink CDC, you need to import data from mysql binlog into kafka and then into hudi data lake. So the problem is here. To do this, I need to start with a mysql, a kafka, a yarn cluster, a hdfs cluster so that the wh ...

Posted by jcanker on Sun, 10 Oct 2021 18:14:44 +0200

The transformation operator of spark and a case

Conversion operator map: the same partition runs orderly, while different partitions run disorderly (fetching data in each partition is powerful but inefficient) (: / 1c37ca978ea74eae9f8c4258b0f9064f) val result1: RDD[Int] = rdd.map(num => { println(num) num * 2 }) mapPartitions: fetch one partition at a time and cal ...

Posted by ciaran on Sun, 10 Oct 2021 12:07:26 +0200

Behind Li Jiaqi's Lipstick selling, e-commerce application of SMS marketing mode!

In the beauty industry, if you want to make the lipstick sales of your own brand high, you should attract the first batch of users when the brand comes out with new products. Because of the progress of science and technology, even if there are many lipstick color numbers, over time, different brands may have the same color, that is, what we oft ...

Posted by psychowolvesbane on Sat, 09 Oct 2021 12:04:58 +0200

Performance optimization

performance optimization Performance optimization related concepts How to understand JDK, JRE, and JVM? JDK (java development kit): Java development tool. Compiled into the corresponding specific machine code. JRE (Java Resource Environment): Java runtime environment. Only. class files can be run, and. class files cannot be compiled. JRE + ...

Posted by garrywinkler on Sat, 09 Oct 2021 08:14:00 +0200

Look at the flink source code and learn the flink --- flink state

Series contents: Look at the flink source code and learn flink Look at the flink source code and learn the flink --- flink state preface Tip: Here you can add the general contents to be recorded in this article: For example, with the continuous development of artificial intelligence, machine learning technology is becoming more and m ...

Posted by annihilate on Fri, 08 Oct 2021 11:14:35 +0200

ZFKC principle and source code analysis

principle summary NameNode active / standby switchover is mainly realized by three components: ZKFailoverController, HealthMonitor and ActiveStandbyElector. HealthMonitor is responsible for monitoring the health of NN, starting a thread to send rpc request, confirming the NN status according to the response, and notifying zkfc through cal ...

Posted by bob2006 on Fri, 08 Oct 2021 11:02:51 +0200

spark advanced: DataFrame and DataSet use

spark advanced (V): use of DataFrame and DataSet DataFrame is a programming abstraction provided by Spark SQL. Similar to RDD, it is also a distributed data collection. But different from RDD, the data of DataFrame is organized into named columns, just like tables in relational database. In addition, a variety of data can be transformed into D ...

Posted by StormS on Thu, 07 Oct 2021 10:51:16 +0200

Hot Topics