hplsql installation and common problems

1.x version of hive does not provide the function of similar stored procedure. When using Hive for data development, it usually encapsulates a paragraph of HQL statements in Shell or other scripts, and then calls them by command line to complete the statistical analysis of a business or a report. The go ...

Posted by facets on Mon, 30 Sep 2019 22:36:40 +0200

Hive 2.3.0 Installation Notes

preparation in advance Complete hadoop installationComplete the installation of mysql Download Hive wget http://mirror.bit.edu.cn/apache/hive/hive-2.3.0/apache-hive-2.3.0-bin.tar.gz Or go to the official website and install it. Unzip to the spe ...

Posted by ubuntu-user on Wed, 18 Sep 2019 13:23:26 +0200

MapReduce custom k, partition, and counter

1. Introduction Case - WordCount Requirement: Statistically output the total number of occurrences of each word in a given set of text files 1. Data format preparation Create a new file cd /export/servers vim wordcount.txt Put the following ...

Posted by Varma69 on Wed, 11 Sep 2019 13:34:17 +0200

Hive format for storing and reading files

Hive files are stored in the following formats: TEXTFILE SEQUENCEFILE RCFILE ORCFILE (since 0.11) TEXTFILE is the default format, which will be defaulted if tables are not specified. When data is imported, data files will be copied directly ...

Posted by MatrixGL on Fri, 06 Sep 2019 04:28:02 +0200

20 Enterprise Tuning 2-Table Optimization

1. Small and large tables join 1.Definition //Keys are relatively dispersed and tables with small amounts of data are placed on the left side of the join, which effectively reduces the chance of memory overflow errors. //Further, you can use map ...

Posted by BigMike on Sun, 18 Aug 2019 04:24:46 +0200

mapreduce quadratic sorting

1 quadratic ranking 1.1 Thought The so-called secondary sorting uses the second field to sort the same data in the first field. For example, the e-commerce platform records the amount of each order for each user. Now it requires all orders belonging to the same user to be sorted, and the user name of th ...

Posted by steply on Thu, 08 Aug 2019 12:00:03 +0200

Auxiliary System - Workflow Scheduler azkaban

1. Overview azkaban: https://azkaban.github.io/ 1.1. Why Workflow Scheduling System is Needed A complete data analysis system usually consists of a large number of task units: shell script program, java program, mapreduce program, hive script, etc. Time-dependent and time-dependent relationships ...

Posted by mikem562 on Sat, 20 Jul 2019 04:06:51 +0200

Big Data Learning Series 5 - Hive Integrating HBase Graphics and Texts

http://www.cnblogs.com/xuwujing/p/8059079.html Introduction In the last article Big Data Learning Series IV - --- Hadoop+Hive Environment Construction Graphics and Text Details (stand-alone) And before Big Data Learning Series II - HBase Environment Construction (stand-alone) Hive and HBase environments were successfully built and teste ...

Posted by dirkbonenkamp on Sat, 18 May 2019 16:02:04 +0200

A Method of Generating Hive DDL Using Sqoop API

First, build tables. ddl: create table testtb (id int,content string, country string) row format delimited fields terminated by '|' lines terminated by '\n' stored as textfile; Importing data 1. Importing sqoop in shell sqoop import --connect jdbc:mysql://localhost:3306/test --username root --password hehe --table testtb ...

Posted by JayBachatero on Thu, 16 May 2019 20:12:19 +0200