spark Learning Notes - core operators

spark Learning Notes - core operator (2) distinct operator /** * Return a new RDD containing the distinct elements in this RDD. */ def distinct(numPartitions: Int)(implicit ord: Ordering[T] = null): RDD[T] = withScope { def removeDuplicatesInPartition(partition: Iterator[T]): Iterator[T] = { // Create an instance of extern ...

Posted by Mikester on Wed, 22 Sep 2021 17:46:44 +0200

[Hard Hadoop] Getting started with HADOOP: Set up centos using the installation Hadoop runtime environment

This article is about Learning Guide for Big Data Specialists from Zero (Full Upgrade) Added in part by Haop. 1. Template Virtual Machine Environment Preparation 0) Install template virtual machine, IP address 192.168.10.100, host name hadoop100, memory 4G, hard disk 50G 1) The configuration requirements for hadoop100 virtual machine are a ...

Posted by fael097 on Tue, 21 Sep 2021 03:32:56 +0200

Introduction to JAVA -- internal classes and Lambda expressions

1. Internal class 1.1 basic use of internal classes (understanding) Inner class concept Define a class in a class. For example, if a class B is defined inside a Class A, class B is called an inner class Internal class definition format Format & ex amp le: /* Format: class External class name{ Modifier class intern ...

Posted by langer on Sun, 12 Sep 2021 09:42:59 +0200

Kafka, RabbitMQ, RockedMQ real application development summary 1

Summary of practical applications of Kafka, RabbitMQ and RockedMQ 1.Kafka Combined with the use cases on the official website, this paper records the examples and practical application of the three mainstream mq.This article does not cover the installation and configuration of relevant environments, but involves more comprehensive codes (inclu ...

Posted by The Cat on Sun, 12 Sep 2021 03:35:20 +0200

Large Data Flink Window Operation

1.Four cornerstones of Flink Flink can be so popular without its four most important cornerstones: Checkpoint, State, Time, Window. ◼ Checkpoint This is one of Flink's most important features. Flink implements a distributed and consistent snapshot based on Handy-Lamport algorithm, which provides consistent semantics. The Chandy-Lamport a ...

Posted by rvpals on Fri, 10 Sep 2021 01:37:26 +0200

Inside Spark Technology: detailed explanation of Shuffle

Next, we will introduce some more detailed implementation details. Shuffle is undoubtedly a key point of performance tuning. This paper will deeply analyze the implementation details of Spark Shuffle from the perspective of source code implementation. The upper boundary of each Stage requires either reading data from external storage or r ...

Posted by rbrown on Wed, 08 Sep 2021 04:50:23 +0200

Java multithreaded batch split List import database

1, Foreword Two days ago, we did an import function. The import started very slowly. It took more than one minute to import 2w pieces of data. Later, we optimized it bit by bit, from directly linking the list into Mysql, assigning the list into Mysql, and multi threading the list into Mysql. Time is getting less and less. It was very cool, and ...

Posted by marshdabeachy on Wed, 08 Sep 2021 04:27:50 +0200

How to view Flink job execution plan

When the requirements of an application are relatively simple, there may not be many operators involved in data conversion, but when the requirements of the application become more and more complex, the number of operators in a Job may reach dozens or even hundreds. With so many operators, the whole application will become very complex, So it w ...

Posted by navtheace on Sun, 05 Sep 2021 02:17:18 +0200

Python Programming: from introduction to practice Chapter 3 exercises

The next day, I started learning Python from scratch. In fact, I made a rapid progress yesterday. I didn't stop until 3-7. Because I wanted to use the while loop, I didn't know much about Python's while (in fact, I didn't want to learn any more because it was too late). Then I finished 3-7 this morning. 3-1 Name: store the names of some friend ...

Posted by felodiaz on Fri, 03 Sep 2021 05:58:30 +0200

I was asked why I learned Python crawler, because I like my little sister

Baidu sexy beauty wallpaper to understand, for nothing else, is to be able to better learn Python! Do you believe that? As long as I see the website with photos of my sister, I just want to download them in bulk!   Why to learn about web crawlers(1) Learning the web crawler, you can customize a search engine to better understand the principle o ...

Posted by kappaluppa on Sat, 20 Jun 2020 05:55:04 +0200