Manually install hive 3 (taking hdp as an example, applicable to CentOS 8)
1 environment
Next, take the HDP Version (HDP-3.1.5.0-centos7-rpm.tar.gz) as an example to introduce the installation and configuration of hive 3 in detail. The environment can be CentOS 7 or CentOS 8 (although the tar package is CentOS 7, the build RPM used in this paper is still applicable to CentOS 8).
Before installation, manually ...
Posted by taddis on Thu, 07 Oct 2021 02:42:52 +0200
Spark big data analysis practice - company sales data analysis
demand
Suppose a company provides you with the following data. The modified data includes three. txt document data, namely date data, order header data and order details data. Let you conduct the following demand analysis according to the data provided by the company. 1. Calculate the annual sales orders and total sales in all orders. 2. ...
Posted by bodzan on Wed, 06 Oct 2021 21:39:02 +0200
Hive3.X high availability deployment
1, Deployment planning
hadoop high availability cluster deployment reference: Hadoop 3. X distributed high availability cluster deployment
1.1 version description
Softwareeditionoperating systemCentOS Linux release 7.8.2003 (Core)JAVAjdk-8u271-linux-x64Hadoophadoop-3.2.2Hivehive-3.1.2
1.2 cluster planning
hive remote mode & & hivese ...
Posted by ernielou on Mon, 04 Oct 2021 19:56:52 +0200
How to connect hbase stand-alone version (using external zk) and java api
How to connect hbase stand-alone version (using external zk) and java api
0 create modify information
timecontentremarks20210927create documents
0 version
Component nameeditionDownload addressoperating systemcentos7CentOS-7-x86_64-DVD-2009.isojdk1.8jdk-8u301-linux-x64.tar.gzhadoop3.0.0hadoop-3.0.0.tar.gzzookeeper3.4.5zookeeper-3.4.5.tar.gzhb ...
Posted by Savahn on Sat, 02 Oct 2021 03:07:21 +0200
Object oriented technology
Software is becoming more and more complex, which has become a trend. The traditional process oriented method can not meet the needs of software development. In order to better maintain the software, people begin to consider corresponding the real-world entities with the modules in the software. The modules in the software not only have the ...
Posted by anshu.sah on Sat, 02 Oct 2021 02:15:34 +0200
Operation of SQL database (job)
Job requirements: 1. Browse electronic courseware and notes. (Browse without submitting) 2. Complete textbook examples 2.6, 2.8, 2.10, 2.11, 2.13. 3. Complete the textbook P280 experiment 1, experiment 2. 4. Complete the contents of the following experiment topics. 5. Complete the third exercise on page P23 of the textbook and convert it into a ...
Posted by DeadlySin3 on Thu, 30 Sep 2021 18:01:38 +0200
Spark Learning Achievement Transformation - machine learning - predicting music labels using Spark ML's logical regression (multivariate classification problem)
The third example uses the logical regression of Spark ML to predict music tags
This is a multivariate classification problem, that is, there are many predicted results.For the introduction and knowledge points of Spark ML, please refer to: Spark ML learning notes - Spark MLlib and Spark ML.
3.1 data preparation
3.1.1 data set file ...
Posted by sanfly on Tue, 28 Sep 2021 23:38:15 +0200
The practice of data Lake based on flink+hudi+hive
1, Introduction
The latest version 0.9 of hudi came out in September after many calls. hudi can store massive data on the basis of hadoop. It can not only batch process, but also stream process on the data lake, that is, the combination of offline and real-time. It also provides two native semantics:
1) Update/Delete records: that is, rec ...
Posted by carnold on Mon, 27 Sep 2021 14:43:07 +0200
Production optimization of Hadoop
HDFS troubleshooting
1. NameNode fault handling
Requirements: The NameNode process hangs and the stored data is lost. How to recover the NameNode fault simulation
kill -9 NameNode processDelete data stored in NameNode:[codecat@hadoop102 dfs]$ rm -rf /opt/module/hadoop-3.1.3/data/dfs/name/*
Solution
Copy the data in the SecondaryNa ...
Posted by DaveMate on Mon, 27 Sep 2021 12:41:22 +0200
[Data Analyst - data analysis project case I] case analysis of 600w + short-term rental data
1 Preface
1.1 data set source
The data in this case comes from the real data of Toronto in 2018-2019 on Airbnb website.The data set contains the listing data set, with about 20000 pieces of data, recording all the house information, including dozens of information fields including price.Another data set in the data set is calendar, which ...
Posted by direction on Sat, 25 Sep 2021 12:33:12 +0200