Introduction to Spark development
What is Spark
The whole Hadoop ecosystem is divided into distributed file system HDFS, computing framework MapReduce and resource scheduling framework Yan. However, with the development of the times, MapReduce's high-intensity disk IO, network communication frequency and dead write make it seriously slow down the operation speed of the who ...
Posted by switchdoc on Sun, 09 Jan 2022 10:15:36 +0100
MapReduce learning 1: overview and simple case preparation
1, Overview
1.1MapReduce definition
MapReduce is a programming framework for distributed computing programs and the core framework for users to develop "hadoop based data analysis applications".
The core function of MapReduce is to integrate the business logic code written by the user and its own default components into a complete ...
Posted by pckidcomplainer on Sat, 08 Jan 2022 13:35:01 +0100
Resource scheduling in Yarn
Three Scheduling Strategies
FIFO Scheduler, Capacity Scheduler and Fair Scheduler policies are listed from left to right. These three policies are introduced below
FIFO Scheduler: first in, first out scheduling strategy Tasks are carried out in turn. Resources can only be released after the execution of previous tasks. This is unreasona ...
Posted by ConnorSBB on Sat, 08 Jan 2022 04:07:34 +0100
Data analysis & node management & setting up NFS gateway service | Cloud computing
1. Data analysis
1.1 problems
This case requires statistical analysis exercises:
Use the client to create the input directory on hdfsAnd upload * txt file to input directoryCall the cluster to analyze the uploaded files and count the words with the most occurrences
1.2 steps
To implement this case, you need to follow the following st ...
Posted by Q695 on Wed, 05 Jan 2022 18:31:13 +0100
Big data and Hadoop & distributed file systems & distributed Hadoop clusters | Cloud computing
1. Deploy Hadoop
1.1 problems
This case requires the installation of stand-alone Hadoop:
Hot word analysis:Minimum configuration: 2cpu, 2G memory, 10G hard diskVirtual machine IP: 192.168.1.50 Hadoop 1Installing and deploying hadoopData analysis to find the most frequently occurring words
1.2 steps
To implement this case, you need to ...
Posted by ball420 on Wed, 05 Jan 2022 18:23:47 +0100
MapReduce framework principle - InputFormat data input
catalogue
1, Introduction to InputFormat
2, Parallelism of slicing and MapTask tasks
3, Job submission process source code
4, InputFormat implementation subclass
5, Slicing mechanism of FileInputFormat
(1) Slicing mechanism:
(2) Slice source code analysis
(3) Slicing steps
(4) FileInputFormat default slice size parameter configuration ...
Posted by cihan on Tue, 04 Jan 2022 16:10:16 +0100
Use of MapReduce Framework-Join
Catalog
I. Introduction
2. Use of Join in Relational Database MySQL
Cartesian product: CROSS JOIN
Internal connection: INNER JOIN
Left Connection: LEFT JOIN
Right Connection: RIGHT JOIN
Outer connection: OUTER JOIN
3. Reduce Join
1. Introduction to Reduce Join
2. Cases
2.1 Requirements:
2.2 Ideas for implementation: reduce end table ...
Posted by Seol on Tue, 04 Jan 2022 11:58:06 +0100
spring boot integrates spark and runs and submits spark task spark on yarn based on yarn
preface
The previous project was based on springboot and integrated spark, running on standalone. I once wrote a blog, link:
https://blog.csdn.net/qq_41587243/article/details/112918052?spm=1001.2014.3001.5501
The same scheme is used now, but spark is submitted on the production environment yarn cluster, and kerbores verification is required, ...
Posted by not_john on Mon, 03 Jan 2022 23:15:07 +0100
Spark on Hive and Hive on Spark for Big Data Hadoop
1. Differences between Spark on Hive and Hive on Spark
1)Spark on Hive
Spark on Hive is Hive's only storage role and Spark is responsible for sql parsing optimization and execution. You can understand that Spark uses Hive statements to manipulate Hive tables through Spark SQL, and Spark RDD runs at the bottom. The steps are as follows: ...
Posted by joejoejoe on Mon, 03 Jan 2022 02:40:47 +0100
Building Hadoop using virtual machine (pseudo distributed building, distributed building)
After learning Hadoop for a semester, I finally chewed off this big bone, tears!!! This article is more like a summary of learning Hadoop
1, Preparatory work
1. hadoop compressed package
There will be this official website. Download the compressed package and prepare it. I use version 2.7.1
2. jdk compressed package
This is the Java r ...
Posted by many_pets on Sat, 01 Jan 2022 19:05:16 +0100