Spark on yarn - spark submits tasks to yarn cluster for source code analysis

catalogue 1, Entry class - SparkSubmit 2, SparkApplication startup - JavaMainApplication, YarnClusterApplication 3, SparkContext initialization 4, YarnClientSchedulerBackend and YarnClusterSchedulerBackend initialization 5, ApplicationMaster startup 6, Spark on Yan task submission process summary 1, Entry class - SparkSubmit When sub ...

Posted by suepahfly on Tue, 04 Jan 2022 10:03:33 +0100

SparkCore learning notes

I RDD overview 1.1 what is RDD RDD (Resilient Distributed Dataset) is called elastic distributed dataset. It is the most basic data abstraction in Spark. Code is an abstract class, which represents an elastic, immutable, partitioned collection in which the elements can be calculated in parallel. 1.2 RDD features (1) Flexibility Elastici ...

Posted by lancet2003 on Tue, 04 Jan 2022 03:33:45 +0100

spring boot integrates spark and runs and submits spark task spark on yarn based on yarn

preface The previous project was based on springboot and integrated spark, running on standalone. I once wrote a blog, link: https://blog.csdn.net/qq_41587243/article/details/112918052?spm=1001.2014.3001.5501 The same scheme is used now, but spark is submitted on the production environment yarn cluster, and kerbores verification is required, ...

Posted by not_john on Mon, 03 Jan 2022 23:15:07 +0100

Multi index query of elasticsearch

1, The origin of the problem In the query of elastic search, we usually set the index to search directly through the URL; If we need to query more indexes and have no rules, we will face an embarrassing situation, exceeding the length limit of the URL; 2, Test environment elasticsearch 6.8.12 test data Add three test indexes, one document ...

Posted by brodwilkinson on Mon, 03 Jan 2022 14:09:11 +0100

hadoop storage and analysis

Apache Hadoop ##Background With the development requirements of information internet and Internet of things, the trend of interconnection of all things is imperative. This leads to the evolution of architecture from a single architecture to a highly concurrent distributed architecture. Data storage also began to evolve from the original stand ...

Posted by ZephyrWest on Mon, 03 Jan 2022 12:59:49 +0100

[review] Action operator of RDD

3, RDD action operator The so-called action operator is actually a method that can trigger Job execution 1,reduce  function signature def reduce(f: (T, T) => T): T  function description Aggregate all elements in RDD, first aggregate data in partitions, and then aggregate data between partitions val rdd: RDD[Int] = sc.m ...

Posted by vapour_ on Mon, 03 Jan 2022 10:46:22 +0100

Detailed installation tutorial of three node big data environment

preface This article belongs to the column "100 problems to solve the installation and deployment of big data". This column is original by the author. Please indicate the source of quotation. Please help point out the deficiencies and errors in the comment area. Thank you! For the directory structure and references of this column ...

Posted by beesgirl713 on Mon, 03 Jan 2022 07:05:22 +0100

Using SparkLauncher to invoke Spark operation in code

background The project needs to deal with many files, and some files have a large number of GB. Therefore, considering that such files are specially written for Spark program processing, for the unified processing of programs, it is necessary to call Spark jobs in code to handle large files. Implementation scheme After investigation, it is foun ...

Posted by nitestryker on Mon, 03 Jan 2022 02:54:31 +0100

Spark on Hive and Hive on Spark for Big Data Hadoop

1. Differences between Spark on Hive and Hive on Spark 1)Spark on Hive Spark on Hive is Hive's only storage role and Spark is responsible for sql parsing optimization and execution. You can understand that Spark uses Hive statements to manipulate Hive tables through Spark SQL, and Spark RDD runs at the bottom. The steps are as follows: ...

Posted by joejoejoe on Mon, 03 Jan 2022 02:40:47 +0100

Scala variables and data types

Variables and data types notes As like as two peas in Java, there is not much to say here. // Single-Line Comments /* multiline comment */ /* * Documentation Comments */ variable and constant Constant: a variable that will not be changed during program execution // variable // var variable name [: variable type] = initial value var i ...

Posted by nuttycoder on Sun, 02 Jan 2022 19:16:02 +0100