Hundreds of billions of warehouse projects (warehouse theory_ Product dimension data loading (zipper table))
Product dimension data loading (zipper table)
Zipper watch design:
1. Collect the full data of the day and store it in the ND (current day) table.
2. You can take out yesterday's full data from the history table and store it in the OD (last day's data) table.
3. ND-OD is the data added and changed on ...
Posted by eagle1771 on Fri, 05 Jun 2020 06:27:04 +0200
Spark SQL -- spark SQL performance optimization
Article directory
1. Cache table data in memory
2. Parameter optimization
1. Cache table data in memory
Performance tuning is mainly about putting data into memory. Caching data in memory can improve performance by directly reading the value of memory. In RDD, use rdd.cache or rdd.persist to cac ...
Posted by abgoosht on Fri, 13 Mar 2020 08:27:46 +0100
Push mode integrates Flume and Spark Streaming
1. architecture
2.Flume configuration
Create a new configuration file under $flume \ home / conf: flume \ push \ streaming.conf
The configuration idea is as follows:
source select netcat and configure the host name and port
Select avro for sink, and configure the host name and port
channel s ...
Posted by phithe on Fri, 06 Mar 2020 12:11:34 +0100
Spark Big Data-Spark+Kafka Build Real-Time Analysis Dashboard
Spark+Kafka Build Real-Time Analysis Dashboard
I. Framework
Spark+Kafka is used to analyze the number of male and female students shopping per second in real time, Spark Streaming is used to process the user shopping log in real time, then websocket is used to push the data to the browser in real ti ...
Posted by t31os on Fri, 17 Jan 2020 03:40:05 +0100
How to write results to MySQL in Spark
The Spark mentioned here includes SparkCore/SparkSQL/SparkStreaming. In fact, all operations are the same. The following shows the code in the actual project.
Method 1: write the entire DataFrame to MySQL at one time (the Schema of DataFrame should be consistent with the domain name defined in the MySQL table)
Dat ...
Posted by blacksheepradio on Wed, 11 Dec 2019 06:08:20 +0100
Advanced case of spark SQL
(1) case of ashes -- UDTF seeking wordcount
Data format:Each line is a string and separated by spaces.Code implementation:
object SparkSqlTest {
def main(args: Array[String]): Unit = {
//Block redundant logs
Logger.getLogger("org.apache.hadoop").setLevel(Level.WARN)
Logger.getLogger("org.apache.spark").setLevel(Leve ...
Posted by marklarah on Tue, 03 Dec 2019 04:36:38 +0100
Java + spark SQL query excel
Download Spark on Spark official website
Spark Download The version is free. After downloading, extract it and put it under bigdata (the directory can be changed)
Download the file winutils.exe required by Hadoop under Windows
Let's find it on the Internet. It won't be uploaded here. In fact, this file is optional, and error reporting doesn' ...
Posted by GrayFox12 on Tue, 03 Dec 2019 04:21:44 +0100
Specific programming scenarios of spark SQL
Introduction case:
object SparkSqlTest {
def main(args: Array[String]): Unit = {
//Block redundant logs
Logger.getLogger("org.apache.hadoop").setLevel(Level.WARN)
Logger.getLogger("org.apache.spark").setLevel(Level.WARN)
Logger.getLogger("org.project-spark").setLevel(Level.WARN)
//Building programmin ...
Posted by tigomark on Sun, 01 Dec 2019 01:17:03 +0100
JDBC data source of spark SQL
JDBC data source
Spark SQL supports reading data from relational databases (such as MySQL) using JDBC. The read data, still represented by DataFrame, can be easily processed using various operators provided by Spark Core.
Created by:
To connect Mysql during query:
It is very useful to use Spark SQL to process dat ...
Posted by Jeroen_nld on Thu, 28 Nov 2019 21:47:37 +0100
Spark 2.4.2 source compilation
Software version:
jdk: 1.8
maven: 3.61 http://maven.apache.org/download.cgi
spark: 2.42 https://archive.apache.org/dist/spark/spark-2.4.2/
Hadoop version: hadoop-2.6.0-cdh5.7.0 (Hadoop version supported by spark compilation, does not need to be installed)
To configure maven:
#Configure environment variables
[root@hadoop004 ...
Posted by buddymoore on Wed, 20 Nov 2019 18:01:18 +0100