spark source code tracks RDD logic code execution
1, Driver
val sparkConnf=new SparkConf().setAppName("wordCount").setMaster("local[3]")
val sparkContext=new SparkContext(sparkConnf)
val rdd = sparkContext.parallelize(Array(1, 2, 3, 4, 5), 3)
val rdd_increace=rdd.map(_+1)
rdd_increace.collect()
sparkContext.stop()
The above code creates two RDDS without shuffle dependency, so there are ...
Posted by irandoct on Fri, 19 Nov 2021 10:59:06 +0100
Object oriented and advanced syntax of Scala
1, Object oriented
1. Class and object details
(1) Class composition structure
Constructor, member variable, member method (function), local variable, code block, internal class
(2) Constructor
scala has two types of constructors: primary and secondaryThe main constructor follows the class name, such as class Student2(val name: String, ...
Posted by tartou2 on Wed, 20 Oct 2021 02:05:53 +0200
Customize zones and sort within zones
Simple wordCount
Suppose there are some data in our file:
spark
spark
hive
hadoop
spark
spark
hive
hadoop
spark
spark
hive
hadoop
spark
spark
hive
hadoop
spark
spark
hive
hadoop
spark
spark
hive
hadoop
spark
spark
hive
hadoop
spark
spark
hive
hadoop
spark
spark
hive
hadoop
spark
spark
hive
hadoop
spark
spark
hive
hadoop
spark
spark
hive
hadoo ...
Posted by dieselmachine on Fri, 15 Oct 2021 01:41:08 +0200
Learn JavaDay04 on October 14, 2021
JavaDay04
Java environment construction
JDK download and installation
Uninstall the original environment and configure the JDK
Delete the original environment variable, system – > environment variable – > java_ Home file location where the value is found – > delete – > then delete the environment variable ...
Posted by CostaKapo on Thu, 14 Oct 2021 21:47:35 +0200
Scala essence is here, take it, interview is not afraid.
Make complaints about big data left and right hands plus technology Tucao group to get more information
preface
As an object-oriented functional programming language, Scala combines object-oriented programming with functional programming to make the code more concise, efficient and easy to understand. That's why Scala is popular.
As a ...
Posted by Jak on Mon, 11 Oct 2021 03:38:55 +0200
The transformation operator of spark and a case
Conversion operator
map: the same partition runs orderly, while different partitions run disorderly
(fetching data in each partition is powerful but inefficient) (: / 1c37ca978ea74eae9f8c4258b0f9064f)
val result1: RDD[Int] = rdd.map(num => {
println(num)
num * 2
})
mapPartitions: fetch one partition at a time and cal ...
Posted by ciaran on Sun, 10 Oct 2021 12:07:26 +0200
spark advanced: DataFrame and DataSet use
spark advanced (V): use of DataFrame and DataSet
DataFrame is a programming abstraction provided by Spark SQL. Similar to RDD, it is also a distributed data collection. But different from RDD, the data of DataFrame is organized into named columns, just like tables in relational database. In addition, a variety of data can be transformed into D ...
Posted by StormS on Thu, 07 Oct 2021 10:51:16 +0200
Spark big data analysis practice - company sales data analysis
demand
Suppose a company provides you with the following data. The modified data includes three. txt document data, namely date data, order header data and order details data. Let you conduct the following demand analysis according to the data provided by the company. 1. Calculate the annual sales orders and total sales in all orders. 2. ...
Posted by bodzan on Wed, 06 Oct 2021 21:39:02 +0200
Scala Series 1 - basic syntax
1, Overview
1.Scala is a multi paradigm static type programming language. Scala supports object-oriented and functional programming 2.Scala source code (. scala) will be compiled into Java bytecode (. class), and then run on the JVM. You can call the existing Java class library to realize the seamless connection between the two languages
...
Posted by Tandem on Thu, 30 Sep 2021 04:25:33 +0200
Spark SQL: API for structured data operation based on spark
Introduction to Spark SQL
Spark SQL is one of the most complex components in spark technology. It provides the function of operating structured data in Spark Program, that is, SQL query. Specifically, spark SQL has the following three important features:
1.Spark SQL supports reading of multiple structured data formats, such as JSON,Parquet ...
Posted by Xurion on Thu, 23 Sep 2021 06:12:04 +0200