Introduction and description
Four traditional mainstream databases:
Oracle MySQL SqlServer DB2
Non relational database: Redis MongoDB
The mainstream database is relational database: there is an association relationship between tables
When we say install database, we mean install database service
When creating a database, it refers t ...
Posted by rockobop on Fri, 25 Feb 2022 14:05:34 +0100
Azkaban is a batch workflow task scheduler launched by Linkedin company. It is mainly used to run a group of work and processes in a specific order in a workflow. Its configuration is to set dependencies through simple < key, value > pairs and dependencies in the configuration. Azkaban uses job profiles to establish dependencies betwe ...
Posted by suzuki on Mon, 10 Jan 2022 23:38:51 +0100
1 data warehouse concept
Data Warehouse can be abbreviated as DW or DWH. Data Warehouse is a strategic set that provides all system data support for all decision-making processes of enterprises. The analysis of data in data warehouse can help enterprises improve business processes, control costs and improve product quality. Data warehouse is n ...
Posted by ddragas on Sat, 01 Jan 2022 01:39:50 +0100
Hive supports the following formats for storing data: TEXTFILE (row storage), sequencefile (row storage), ORC (column storage) and PARQUET (column storage)
1: Column storage and row storage
The left side of the figure above is a logical table, the first one on the right is row storage, and the second one is column storage.
The storage ...
Posted by Lol5916 on Fri, 31 Dec 2021 12:59:52 +0100
1. Structure creation
create Structure type structure name structure description;
2. Display structure
show Structure type (plural)
Display structure creation details:
show create Structure type and structure name;
3. Data operation (data sheet)
insert into Table name values
select from Table name
Posted by herreram on Mon, 27 Dec 2021 21:39:55 +0100
The DBLINK built this time is to access the SQL SERVER database on the Win side of the Damon database on the Linux side. By configuring ODBC to connect to the SQL SERVER database, DM creates an ODBC DBLINK connection to realize the DBLINK between DM and SQL SERVER. The following are the specific operation steps.
DBLINK (Datab ...
Posted by Dustin013 on Thu, 23 Dec 2021 18:52:15 +0100
Hive improved chapter
Use of Hive
Hive's bucket table
1. Principle of drum dividing table
Bucket splitting is a more fine-grained partition relative to partition. Hive table or partition table can further divide bucketsDivide the bucket, take the hash value of the whole data content according to a column, and determine which bucket th ...
Posted by luisluis on Wed, 08 Dec 2021 08:35:11 +0100
Functions: clean the collected log data, filter invalid data and static resources
Method: write MapReduce for processing
1) Entity class Bean
Describe various fields of log data, such as client ip, request url, request status, etc
2) Tool class
Used to process beans: set the validity or invalidity of log ...
Posted by KRAK_JOE on Fri, 03 Dec 2021 17:01:43 +0100
The new architecture is integrated with the lake warehouse
1, Version Description
2, Compile and package Hudi version 0.10.0
1. Use git to clone the latest master on github
2. Compilation and packaging
3, Create a flick project
1. Main contents of POM document
4.hudi code (refer to the official ...
Posted by WebbDawg on Fri, 03 Dec 2021 03:42:39 +0100