Manually install hive 3 (taking hdp as an example, applicable to CentOS 8)

1 environment Next, take the HDP Version (HDP-3.1.5.0-centos7-rpm.tar.gz) as an example to introduce the installation and configuration of hive 3 in detail. The environment can be CentOS 7 or CentOS 8 (although the tar package is CentOS 7, the build RPM used in this paper is still applicable to CentOS 8). Before installation, manually ...

Posted by taddis on Thu, 07 Oct 2021 02:42:52 +0200

Spark big data analysis practice - company sales data analysis

demand Suppose a company provides you with the following data. The modified data includes three. txt document data, namely date data, order header data and order details data. Let you conduct the following demand analysis according to the data provided by the company. 1. Calculate the annual sales orders and total sales in all orders. 2. ...

Posted by bodzan on Wed, 06 Oct 2021 21:39:02 +0200

Hive3.X high availability deployment

1, Deployment planning hadoop high availability cluster deployment reference: Hadoop 3. X distributed high availability cluster deployment 1.1 version description Softwareeditionoperating systemCentOS Linux release 7.8.2003 (Core)JAVAjdk-8u271-linux-x64Hadoophadoop-3.2.2Hivehive-3.1.2 1.2 cluster planning hive remote mode & & hivese ...

Posted by ernielou on Mon, 04 Oct 2021 19:56:52 +0200

How to connect hbase stand-alone version (using external zk) and java api

How to connect hbase stand-alone version (using external zk) and java api 0 create modify information timecontentremarks20210927create documents 0 version Component nameeditionDownload addressoperating systemcentos7CentOS-7-x86_64-DVD-2009.isojdk1.8jdk-8u301-linux-x64.tar.gzhadoop3.0.0hadoop-3.0.0.tar.gzzookeeper3.4.5zookeeper-3.4.5.tar.gzhb ...

Posted by Savahn on Sat, 02 Oct 2021 03:07:21 +0200

Object oriented technology

Software is becoming more and more complex, which has become a trend. The traditional process oriented method can not meet the needs of software development. In order to better maintain the software, people begin to consider corresponding the real-world entities with the modules in the software. The modules in the software not only have the ...

Posted by anshu.sah on Sat, 02 Oct 2021 02:15:34 +0200

Operation of SQL database (job)

Job requirements: 1. Browse electronic courseware and notes. (Browse without submitting) 2. Complete textbook examples 2.6, 2.8, 2.10, 2.11, 2.13. 3. Complete the textbook P280 experiment 1, experiment 2. 4. Complete the contents of the following experiment topics. 5. Complete the third exercise on page P23 of the textbook and convert it into a ...

Posted by DeadlySin3 on Thu, 30 Sep 2021 18:01:38 +0200

Spark Learning Achievement Transformation - machine learning - predicting music labels using Spark ML's logical regression (multivariate classification problem)

The third example uses the logical regression of Spark ML to predict music tags This is a multivariate classification problem, that is, there are many predicted results.For the introduction and knowledge points of Spark ML, please refer to: Spark ML learning notes - Spark MLlib and Spark ML. 3.1 data preparation 3.1.1 data set file ...

Posted by sanfly on Tue, 28 Sep 2021 23:38:15 +0200

The practice of data Lake based on flink+hudi+hive

1, Introduction The latest version 0.9 of hudi came out in September after many calls. hudi can store massive data on the basis of hadoop. It can not only batch process, but also stream process on the data lake, that is, the combination of offline and real-time. It also provides two native semantics: 1) Update/Delete records: that is, rec ...

Posted by carnold on Mon, 27 Sep 2021 14:43:07 +0200

Production optimization of Hadoop

HDFS troubleshooting 1. NameNode fault handling Requirements: The NameNode process hangs and the stored data is lost. How to recover the NameNode fault simulation kill -9 NameNode processDelete data stored in NameNode:[codecat@hadoop102 dfs]$ rm -rf /opt/module/hadoop-3.1.3/data/dfs/name/* Solution Copy the data in the SecondaryNa ...

Posted by DaveMate on Mon, 27 Sep 2021 12:41:22 +0200

[Data Analyst - data analysis project case I] case analysis of 600w + short-term rental data

1 Preface 1.1 data set source The data in this case comes from the real data of Toronto in 2018-2019 on Airbnb website.The data set contains the listing data set, with about 20000 pieces of data, recording all the house information, including dozens of information fields including price.Another data set in the data set is calendar, which ...

Posted by direction on Sat, 25 Sep 2021 12:33:12 +0200