Hundreds of billions of warehouse projects (warehouse theory_ Product dimension data loading (zipper table))

Product dimension data loading (zipper table) Zipper watch design: 1. Collect the full data of the day and store it in the ND (current day) table. 2. You can take out yesterday's full data from the history table and store it in the OD (last day's data) table. 3. ND-OD is the data added and changed on ...

Posted by eagle1771 on Fri, 05 Jun 2020 06:27:04 +0200

Spark SQL -- spark SQL performance optimization

Article directory 1. Cache table data in memory 2. Parameter optimization 1. Cache table data in memory Performance tuning is mainly about putting data into memory. Caching data in memory can improve performance by directly reading the value of memory. In RDD, use rdd.cache or rdd.persist to cac ...

Posted by abgoosht on Fri, 13 Mar 2020 08:27:46 +0100

Push mode integrates Flume and Spark Streaming

1. architecture 2.Flume configuration Create a new configuration file under $flume \ home / conf: flume \ push \ streaming.conf The configuration idea is as follows: source select netcat and configure the host name and port Select avro for sink, and configure the host name and port channel s ...

Posted by phithe on Fri, 06 Mar 2020 12:11:34 +0100

Spark Big Data-Spark+Kafka Build Real-Time Analysis Dashboard

Spark+Kafka Build Real-Time Analysis Dashboard I. Framework Spark+Kafka is used to analyze the number of male and female students shopping per second in real time, Spark Streaming is used to process the user shopping log in real time, then websocket is used to push the data to the browser in real ti ...

Posted by t31os on Fri, 17 Jan 2020 03:40:05 +0100

How to write results to MySQL in Spark

The Spark mentioned here includes SparkCore/SparkSQL/SparkStreaming. In fact, all operations are the same. The following shows the code in the actual project. Method 1: write the entire DataFrame to MySQL at one time (the Schema of DataFrame should be consistent with the domain name defined in the MySQL table) Dat ...

Posted by blacksheepradio on Wed, 11 Dec 2019 06:08:20 +0100

Advanced case of spark SQL

(1) case of ashes -- UDTF seeking wordcount Data format:Each line is a string and separated by spaces.Code implementation: object SparkSqlTest { def main(args: Array[String]): Unit = { //Block redundant logs Logger.getLogger("org.apache.hadoop").setLevel(Level.WARN) Logger.getLogger("org.apache.spark").setLevel(Leve ...

Posted by marklarah on Tue, 03 Dec 2019 04:36:38 +0100

Java + spark SQL query excel

Download Spark on Spark official website Spark Download The version is free. After downloading, extract it and put it under bigdata (the directory can be changed) Download the file winutils.exe required by Hadoop under Windows Let's find it on the Internet. It won't be uploaded here. In fact, this file is optional, and error reporting doesn' ...

Posted by GrayFox12 on Tue, 03 Dec 2019 04:21:44 +0100

Specific programming scenarios of spark SQL

Introduction case: object SparkSqlTest { def main(args: Array[String]): Unit = { //Block redundant logs Logger.getLogger("org.apache.hadoop").setLevel(Level.WARN) Logger.getLogger("org.apache.spark").setLevel(Level.WARN) Logger.getLogger("org.project-spark").setLevel(Level.WARN) //Building programmin ...

Posted by tigomark on Sun, 01 Dec 2019 01:17:03 +0100

JDBC data source of spark SQL

JDBC data source Spark SQL supports reading data from relational databases (such as MySQL) using JDBC. The read data, still represented by DataFrame, can be easily processed using various operators provided by Spark Core. Created by: To connect Mysql during query:   It is very useful to use Spark SQL to process dat ...

Posted by Jeroen_nld on Thu, 28 Nov 2019 21:47:37 +0100

Spark 2.4.2 source compilation

Software version:     jdk: 1.8     maven: 3.61    http://maven.apache.org/download.cgi     spark: 2.42      https://archive.apache.org/dist/spark/spark-2.4.2/ Hadoop version: hadoop-2.6.0-cdh5.7.0 (Hadoop version supported by spark compilation, does not need to be installed) To configure maven: #Configure environment variables [root@hadoop004  ...

Posted by buddymoore on Wed, 20 Nov 2019 18:01:18 +0100