hplsql installation and common problems
1.x version of hive does not provide the function of similar stored procedure. When using Hive for data development, it usually encapsulates a paragraph of HQL statements in Shell or other scripts, and then calls them by command line to complete the statistical analysis of a business or a report. The go ...
Posted by facets on Mon, 30 Sep 2019 22:36:40 +0200
Hive 2.3.0 Installation Notes
preparation in advance
Complete hadoop installationComplete the installation of mysql
Download Hive
wget http://mirror.bit.edu.cn/apache/hive/hive-2.3.0/apache-hive-2.3.0-bin.tar.gz
Or go to the official website and install it.
Unzip to the spe ...
Posted by ubuntu-user on Wed, 18 Sep 2019 13:23:26 +0200
MapReduce custom k, partition, and counter
1. Introduction Case - WordCount
Requirement: Statistically output the total number of occurrences of each word in a given set of text files
1. Data format preparation
Create a new file
cd /export/servers
vim wordcount.txt
Put the following ...
Posted by Varma69 on Wed, 11 Sep 2019 13:34:17 +0200
Hive format for storing and reading files
Hive files are stored in the following formats:
TEXTFILE
SEQUENCEFILE
RCFILE
ORCFILE (since 0.11)
TEXTFILE is the default format, which will be defaulted if tables are not specified. When data is imported, data files will be copied directly ...
Posted by MatrixGL on Fri, 06 Sep 2019 04:28:02 +0200
20 Enterprise Tuning 2-Table Optimization
1. Small and large tables join
1.Definition
//Keys are relatively dispersed and tables with small amounts of data are placed on the left side of the join, which effectively reduces the chance of memory overflow errors.
//Further, you can use map ...
Posted by BigMike on Sun, 18 Aug 2019 04:24:46 +0200
mapreduce quadratic sorting
1 quadratic ranking
1.1 Thought
The so-called secondary sorting uses the second field to sort the same data in the first field.
For example, the e-commerce platform records the amount of each order for each user. Now it requires all orders belonging to the same user to be sorted, and the user name of th ...
Posted by steply on Thu, 08 Aug 2019 12:00:03 +0200
Auxiliary System - Workflow Scheduler azkaban
1. Overview
azkaban: https://azkaban.github.io/
1.1. Why Workflow Scheduling System is Needed
A complete data analysis system usually consists of a large number of task units:
shell script program, java program, mapreduce program, hive script, etc.
Time-dependent and time-dependent relationships ...
Posted by mikem562 on Sat, 20 Jul 2019 04:06:51 +0200
Big Data Learning Series 5 - Hive Integrating HBase Graphics and Texts
http://www.cnblogs.com/xuwujing/p/8059079.html
Introduction
In the last article Big Data Learning Series IV - --- Hadoop+Hive Environment Construction Graphics and Text Details (stand-alone) And before Big Data Learning Series II
- HBase Environment Construction (stand-alone) Hive and HBase environments were successfully built and teste ...
Posted by dirkbonenkamp on Sat, 18 May 2019 16:02:04 +0200
A Method of Generating Hive DDL Using Sqoop API
First, build tables.
ddl:
create table testtb (id int,content string, country string) row format delimited fields terminated by '|' lines terminated by '\n' stored as textfile;
Importing data
1. Importing sqoop in shell
sqoop import --connect jdbc:mysql://localhost:3306/test --username root --password hehe --table testtb ...
Posted by JayBachatero on Thu, 16 May 2019 20:12:19 +0200