Spark SQL -- spark SQL performance optimization

Article directory 1. Cache table data in memory 2. Parameter optimization 1. Cache table data in memory Performance tuning is mainly about putting data into memory. Caching data in memory can improve performance by directly reading the value of memory. In RDD, use rdd.cache or rdd.persist to cac ...

Posted by abgoosht on Fri, 13 Mar 2020 08:27:46 +0100

Scala -- 3. Functions

In Scala, methods and functions can be almost the same (for example, their definition, use and operation mechanism are the same), but the use of functions is more flexible and diverse. Functional programming is discussed from the perspective of programming mode (normal form). It can be understood as fol ...

Posted by GiaTuan on Tue, 18 Feb 2020 14:07:21 +0100

Distributed ID - snowflake algorithm

background With the increasing business volume, the division of database is becoming more and more detailed, and the concept of sub database and sub table is also gradually implemented. The generation of primary key ID such as auto increase primary key or sequence no longer meets the demand, so the ...

Posted by rallokkcaz on Sat, 18 Jan 2020 10:12:11 +0100

Akka implementation of OAuth 2 service: access_token management

There are several core points to implement an OAuth 2 service: OAuth 2 protocol analysis There may be many connected users, and the system needs to support horizontal expansion State control of access_token of each connected user: validity control Services should support fault tolerance, recoverability, scalability, high concurrency and other ...

Posted by charlieholder on Fri, 10 Jan 2020 09:03:54 +0100

Scala notes (5): MySQL database configuration and scala programming

MySQL database installation configuration In order to see / operate the database easily and intuitively, Navicat Premium software is generally installed to support multiple databases. There is not much nonsense in the installation process, mainly referring to the following connections, no pit in hand test, which are the main ...

Posted by aleph_x on Sat, 04 Jan 2020 02:29:12 +0100

A small problem in executing scala program with JAVA command line

Example of scala accessing MySQL database: import java.sql.{Connection,ResultSet,DriverManager} import scala.util.control.Exception.Catch //import java.sql.DriverManager object DataAnalysisTest { def main(args: Array[String]): Unit = { if(args.length!=3) { println("Parameter error!") println("usage metho ...

Posted by jdimino on Mon, 16 Dec 2019 19:38:07 +0100

Spark 2.4.2 source compilation

Software version:     jdk: 1.8     maven: 3.61    http://maven.apache.org/download.cgi     spark: 2.42      https://archive.apache.org/dist/spark/spark-2.4.2/ Hadoop version: hadoop-2.6.0-cdh5.7.0 (Hadoop version supported by spark compilation, does not need to be installed) To configure maven: #Configure environment variables [root@hadoop004  ...

Posted by buddymoore on Wed, 20 Nov 2019 18:01:18 +0100

[LeetCode] 76. Minimum covering substring

Title Link: https://leetcode-cn.com/problems/minimum-window-substring/ Title Description: Give you a string S and a string t, please find out in the string S: the smallest string containing all the letters of T. Example: Input: S = adobecodebanc, t = ABC Output: "BANC" Train of thought: sliding window We only need to ensure that the ...

Posted by Saeven on Sat, 02 Nov 2019 08:13:15 +0100

Jakartase_Multithread--Thread Synchronization Method--Synchronization Block

Introduction and outline One of the most common situations that occur in concurrent programming is that more than one thread of execution uses shared resources. In concurrency In an application, it is normal for multiple threads to read or write the same data or access the same file or database connection. . These shared resources can ca ...

Posted by emilyfrazier on Wed, 24 Jul 2019 04:22:02 +0200

Talking about concurrent programming: Future model (Java, Clojure, Scala multilingual perspective)

0x00 Preface In fact, the Future model is not far away from us. If you have been exposed to such excellent open source projects as Spark and Hadoop, pay attention to their output logs when you run the program. If you are not careful, you will find Future. There are many excellent design patterns in the field of concurrent programming, such as t ...

Posted by theDog on Sat, 06 Jul 2019 19:23:12 +0200