spark source code tracks RDD logic code execution

1, Driver val sparkConnf=new SparkConf().setAppName("wordCount").setMaster("local[3]") val sparkContext=new SparkContext(sparkConnf) val rdd = sparkContext.parallelize(Array(1, 2, 3, 4, 5), 3) val rdd_increace=rdd.map(_+1) rdd_increace.collect() sparkContext.stop() The above code creates two RDDS without shuffle dependency, so there are ...

Posted by irandoct on Fri, 19 Nov 2021 10:59:06 +0100

Object oriented and advanced syntax of Scala

1, Object oriented 1. Class and object details (1) Class composition structure Constructor, member variable, member method (function), local variable, code block, internal class (2) Constructor scala has two types of constructors: primary and secondaryThe main constructor follows the class name, such as class Student2(val name: String, ...

Posted by tartou2 on Wed, 20 Oct 2021 02:05:53 +0200

Customize zones and sort within zones

Simple wordCount Suppose there are some data in our file: spark spark hive hadoop spark spark hive hadoop spark spark hive hadoop spark spark hive hadoop spark spark hive hadoop spark spark hive hadoop spark spark hive hadoop spark spark hive hadoop spark spark hive hadoop spark spark hive hadoop spark spark hive hadoop spark spark hive hadoo ...

Posted by dieselmachine on Fri, 15 Oct 2021 01:41:08 +0200

Learn JavaDay04 on October 14, 2021

JavaDay04 Java environment construction JDK download and installation Uninstall the original environment and configure the JDK Delete the original environment variable, system – > environment variable – > java_ Home file location where the value is found – > delete – > then delete the environment variable ...

Posted by CostaKapo on Thu, 14 Oct 2021 21:47:35 +0200

Scala essence is here, take it, interview is not afraid.

Make complaints about big data left and right hands plus technology Tucao group to get more information preface As an object-oriented functional programming language, Scala combines object-oriented programming with functional programming to make the code more concise, efficient and easy to understand. That's why Scala is popular. As a ...

Posted by Jak on Mon, 11 Oct 2021 03:38:55 +0200

The transformation operator of spark and a case

Conversion operator map: the same partition runs orderly, while different partitions run disorderly (fetching data in each partition is powerful but inefficient) (: / 1c37ca978ea74eae9f8c4258b0f9064f) val result1: RDD[Int] = rdd.map(num => { println(num) num * 2 }) mapPartitions: fetch one partition at a time and cal ...

Posted by ciaran on Sun, 10 Oct 2021 12:07:26 +0200

spark advanced: DataFrame and DataSet use

spark advanced (V): use of DataFrame and DataSet DataFrame is a programming abstraction provided by Spark SQL. Similar to RDD, it is also a distributed data collection. But different from RDD, the data of DataFrame is organized into named columns, just like tables in relational database. In addition, a variety of data can be transformed into D ...

Posted by StormS on Thu, 07 Oct 2021 10:51:16 +0200

Spark big data analysis practice - company sales data analysis

demand Suppose a company provides you with the following data. The modified data includes three. txt document data, namely date data, order header data and order details data. Let you conduct the following demand analysis according to the data provided by the company. 1. Calculate the annual sales orders and total sales in all orders. 2. ...

Posted by bodzan on Wed, 06 Oct 2021 21:39:02 +0200

Scala Series 1 - basic syntax

1, Overview 1.Scala is a multi paradigm static type programming language. Scala supports object-oriented and functional programming 2.Scala source code (. scala) will be compiled into Java bytecode (. class), and then run on the JVM. You can call the existing Java class library to realize the seamless connection between the two languages ...

Posted by Tandem on Thu, 30 Sep 2021 04:25:33 +0200

Spark SQL: API for structured data operation based on spark

Introduction to Spark SQL Spark SQL is one of the most complex components in spark technology. It provides the function of operating structured data in Spark Program, that is, SQL query. Specifically, spark SQL has the following three important features: 1.Spark SQL supports reading of multiple structured data formats, such as JSON,Parquet ...

Posted by Xurion on Thu, 23 Sep 2021 06:12:04 +0200