Spark [Page 4] - Programmer Think - where programmers share thinking

Spark

sparksql insert postgresql field format mismatch error handling

1. Error Key Information Caused by: org.postgresql.util.PSQLException: ERROR: column "c1" is of type point but expression is of type character at org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2553) at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:2285) at org.postgres ...

Posted by jek on Thu, 20 Jan 2022 04:23:07 +0100

The process of spark sql reading and writing hive

Hive related configuration is required for Spark sql to read and write hive, so hive site is generally used The XML file is placed in the conf directory of spark. Code calls are simple. The key is the source code analysis process and how spark interacts with hive. 1. Code call Read hive code SparkSession sparkSession = SparkSession.builder() ...

Posted by Brian W on Tue, 18 Jan 2022 02:41:37 +0100

Spark SQL implementation principle - logic plan optimization - operation push down: EliminateOuterJoin rule

Spark SQL implementation principle - logic plan optimization - operation push down: EliminateOuterJoin rule This rule optimizes the outer join operation in order to eliminate the outer join operation as much as possible and convert it into inner or other join types. The EliminateOuterJoin optimization rule can take effect when the join operati ...

Posted by plasko on Mon, 17 Jan 2022 03:36:37 +0100

Stick to Spark event bus -- talk about how to realize event monitoring in Spark

Spark adopts a large number of event listening methods to realize the communication between components on the driver side. This article will explain how to implement event listening in spark Observer mode and listener There is an observer pattern in the design pattern, which establishes a dependency between objects. When an object state chang ...

Posted by Mouse on Mon, 17 Jan 2022 02:42:46 +0100

Spark core programming - Introduction to RDD (distributed elastic data set), RDD core attributes, RDD execution principle

preface Spark computing framework encapsulates three data structures for high concurrency and high throughput data processing Handle different application scenarios. The three data structures are: RDD: distributed elastic datasetAccumulator: distributed shared write only variableBroadcast variables: distributed shared read-only variables C ...

Posted by deth4uall on Fri, 14 Jan 2022 21:41:16 +0100

Application practice of tensorflow for java + spark Scala distributed machine learning computing framework

Wang Hui, joined qunar.com in 2017. At present, he is responsible for anti crawler related risk control business, has a wide range of technical fields, and is continuously exploring the practice direction of intelligent risk control. I preface In Qunar intelligent risk control scenario, the risk control R & D team often applies som ...

Posted by moon 111 on Fri, 14 Jan 2022 02:20:46 +0100

Introduction to Spark development

What is Spark The whole Hadoop ecosystem is divided into distributed file system HDFS, computing framework MapReduce and resource scheduling framework Yan. However, with the development of the times, MapReduce's high-intensity disk IO, network communication frequency and dead write make it seriously slow down the operation speed of the who ...

Posted by switchdoc on Sun, 09 Jan 2022 10:15:36 +0100

[Spark] user defined functions UDF and UDAF

All the trees we use in this article are user JSON, as shown in the figure below {"username": "zhangsan","age": 20} {"username": "lisi","age": 21} {"username": "wangwu","age": 19} Custom UDF Introduction to UDF UDF: enter a line and return a result For one-to-one relationship, if you put a value into a function, you will return ...

Posted by shaunie123 on Wed, 05 Jan 2022 21:44:19 +0100

Spark + parse text + recursion + pattern matching + broadcast filtering

catalogue Requirements: query a given number of tables, several of which are used in the code Transformation concept: given key Word, several hits in the log file Type selection: Spark parsing keyword list rdd1;Spark parses file directory data rdd2; rdd1 join rdd2(broadcast) Upper Code: Step 1: create SparkContext Step 2: read the file ...

Posted by shan111 on Tue, 04 Jan 2022 15:35:44 +0100

Spark on yarn - spark submits tasks to yarn cluster for source code analysis

catalogue 1, Entry class - SparkSubmit 2, SparkApplication startup - JavaMainApplication, YarnClusterApplication 3, SparkContext initialization 4, YarnClientSchedulerBackend and YarnClusterSchedulerBackend initialization 5, ApplicationMaster startup 6, Spark on Yan task submission process summary 1, Entry class - SparkSubmit When sub ...

Posted by suepahfly on Tue, 04 Jan 2022 10:03:33 +0100

Hot Topics