Spark on yarn - spark submits tasks to yarn cluster for source code analysis
catalogue
1, Entry class - SparkSubmit
2, SparkApplication startup - JavaMainApplication, YarnClusterApplication
3, SparkContext initialization
4, YarnClientSchedulerBackend and YarnClusterSchedulerBackend initialization
5, ApplicationMaster startup
6, Spark on Yan task submission process summary
1, Entry class - SparkSubmit
When sub ...
Posted by suepahfly on Tue, 04 Jan 2022 10:03:33 +0100
SparkCore learning notes
I RDD overview
1.1 what is RDD
RDD (Resilient Distributed Dataset) is called elastic distributed dataset. It is the most basic data abstraction in Spark. Code is an abstract class, which represents an elastic, immutable, partitioned collection in which the elements can be calculated in parallel.
1.2 RDD features
(1) Flexibility Elastici ...
Posted by lancet2003 on Tue, 04 Jan 2022 03:33:45 +0100
spring boot integrates spark and runs and submits spark task spark on yarn based on yarn
preface
The previous project was based on springboot and integrated spark, running on standalone. I once wrote a blog, link:
https://blog.csdn.net/qq_41587243/article/details/112918052?spm=1001.2014.3001.5501
The same scheme is used now, but spark is submitted on the production environment yarn cluster, and kerbores verification is required, ...
Posted by not_john on Mon, 03 Jan 2022 23:15:07 +0100
Multi index query of elasticsearch
1, The origin of the problem
In the query of elastic search, we usually set the index to search directly through the URL; If we need to query more indexes and have no rules, we will face an embarrassing situation, exceeding the length limit of the URL;
2, Test environment
elasticsearch 6.8.12
test data
Add three test indexes, one document ...
Posted by brodwilkinson on Mon, 03 Jan 2022 14:09:11 +0100
hadoop storage and analysis
Apache Hadoop
##Background
With the development requirements of information internet and Internet of things, the trend of interconnection of all things is imperative. This leads to the evolution of architecture from a single architecture to a highly concurrent distributed architecture. Data storage also began to evolve from the original stand ...
Posted by ZephyrWest on Mon, 03 Jan 2022 12:59:49 +0100
[review] Action operator of RDD
3, RDD action operator
The so-called action operator is actually a method that can trigger Job execution
1,reduce
function signature
def reduce(f: (T, T) => T): T
function description
Aggregate all elements in RDD, first aggregate data in partitions, and then aggregate data between partitions
val rdd: RDD[Int] = sc.m ...
Posted by vapour_ on Mon, 03 Jan 2022 10:46:22 +0100
Detailed installation tutorial of three node big data environment
preface
This article belongs to the column "100 problems to solve the installation and deployment of big data". This column is original by the author. Please indicate the source of quotation. Please help point out the deficiencies and errors in the comment area. Thank you!
For the directory structure and references of this column ...
Posted by beesgirl713 on Mon, 03 Jan 2022 07:05:22 +0100
Using SparkLauncher to invoke Spark operation in code
background
The project needs to deal with many files, and some files have a large number of GB. Therefore, considering that such files are specially written for Spark program processing, for the unified processing of programs, it is necessary to call Spark jobs in code to handle large files.
Implementation scheme
After investigation, it is foun ...
Posted by nitestryker on Mon, 03 Jan 2022 02:54:31 +0100
Spark on Hive and Hive on Spark for Big Data Hadoop
1. Differences between Spark on Hive and Hive on Spark
1)Spark on Hive
Spark on Hive is Hive's only storage role and Spark is responsible for sql parsing optimization and execution. You can understand that Spark uses Hive statements to manipulate Hive tables through Spark SQL, and Spark RDD runs at the bottom. The steps are as follows: ...
Posted by joejoejoe on Mon, 03 Jan 2022 02:40:47 +0100
Scala variables and data types
Variables and data types
notes
As like as two peas in Java, there is not much to say here.
// Single-Line Comments
/*
multiline comment
*/
/*
* Documentation Comments
*/
variable and constant
Constant: a variable that will not be changed during program execution
// variable
// var variable name [: variable type] = initial value
var i ...
Posted by nuttycoder on Sun, 02 Jan 2022 19:16:02 +0100