sparksql insert postgresql field format mismatch error handling
1. Error Key Information
Caused by: org.postgresql.util.PSQLException: ERROR: column "c1" is of type point but expression is of type character
at org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2553)
at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:2285)
at org.postgres ...
Posted by jek on Thu, 20 Jan 2022 04:23:07 +0100
The process of spark sql reading and writing hive
Hive related configuration is required for Spark sql to read and write hive, so hive site is generally used The XML file is placed in the conf directory of spark. Code calls are simple. The key is the source code analysis process and how spark interacts with hive.
1. Code call
Read hive code
SparkSession sparkSession = SparkSession.builder() ...
Posted by Brian W on Tue, 18 Jan 2022 02:41:37 +0100
Spark SQL implementation principle - logic plan optimization - operation push down: EliminateOuterJoin rule
Spark SQL implementation principle - logic plan optimization - operation push down: EliminateOuterJoin rule
This rule optimizes the outer join operation in order to eliminate the outer join operation as much as possible and convert it into inner or other join types. The EliminateOuterJoin optimization rule can take effect when the join operati ...
Posted by plasko on Mon, 17 Jan 2022 03:36:37 +0100
Stick to Spark event bus -- talk about how to realize event monitoring in Spark
Spark adopts a large number of event listening methods to realize the communication between components on the driver side. This article will explain how to implement event listening in spark
Observer mode and listener
There is an observer pattern in the design pattern, which establishes a dependency between objects. When an object state chang ...
Posted by Mouse on Mon, 17 Jan 2022 02:42:46 +0100
Spark core programming - Introduction to RDD (distributed elastic data set), RDD core attributes, RDD execution principle
preface
Spark computing framework encapsulates three data structures for high concurrency and high throughput data processing Handle different application scenarios. The three data structures are:
RDD: distributed elastic datasetAccumulator: distributed shared write only variableBroadcast variables: distributed shared read-only variables
C ...
Posted by deth4uall on Fri, 14 Jan 2022 21:41:16 +0100
Application practice of tensorflow for java + spark Scala distributed machine learning computing framework
Wang Hui, joined qunar.com in 2017. At present, he is responsible for anti crawler related risk control business, has a wide range of technical fields, and is continuously exploring the practice direction of intelligent risk control.
I preface
In Qunar intelligent risk control scenario, the risk control R & D team often applies som ...
Posted by moon 111 on Fri, 14 Jan 2022 02:20:46 +0100
Introduction to Spark development
What is Spark
The whole Hadoop ecosystem is divided into distributed file system HDFS, computing framework MapReduce and resource scheduling framework Yan. However, with the development of the times, MapReduce's high-intensity disk IO, network communication frequency and dead write make it seriously slow down the operation speed of the who ...
Posted by switchdoc on Sun, 09 Jan 2022 10:15:36 +0100
[Spark] user defined functions UDF and UDAF
All the trees we use in this article are user JSON, as shown in the figure below
{"username": "zhangsan","age": 20} {"username": "lisi","age": 21} {"username": "wangwu","age": 19}
Custom UDF
Introduction to UDF
UDF: enter a line and return a result For one-to-one relationship, if you put a value into a function, you will return ...
Posted by shaunie123 on Wed, 05 Jan 2022 21:44:19 +0100
Spark + parse text + recursion + pattern matching + broadcast filtering
catalogue
Requirements: query a given number of tables, several of which are used in the code
Transformation concept: given key
Word, several hits in the log file
Type selection: Spark parsing keyword list rdd1;Spark parses file directory data rdd2; rdd1 join rdd2(broadcast)
Upper Code:
Step 1: create SparkContext
Step 2: read the file ...
Posted by shan111 on Tue, 04 Jan 2022 15:35:44 +0100
Spark on yarn - spark submits tasks to yarn cluster for source code analysis
catalogue
1, Entry class - SparkSubmit
2, SparkApplication startup - JavaMainApplication, YarnClusterApplication
3, SparkContext initialization
4, YarnClientSchedulerBackend and YarnClusterSchedulerBackend initialization
5, ApplicationMaster startup
6, Spark on Yan task submission process summary
1, Entry class - SparkSubmit
When sub ...
Posted by suepahfly on Tue, 04 Jan 2022 10:03:33 +0100