Big data offline processing data project website log file data collection log splitting data collection to HDFS and preprocessing

Introduction: This article is about the first process of big data offline data processing project: data collection Main contents: 1) Use flume to collect website log file data to access.log 2) Write shell script: split the collected log data file (otherwise the access.log file is too large) and rename it to access_ Mm / DD / yyyy.log. &nbsp ...

Posted by erth on Tue, 30 Nov 2021 12:59:03 +0100

sqoop principle and basic application

1. Introduction to sqoop (1) Introduction: Sqoop is a tool of Apache for "transferring data between hadoop and relational database server".   import data: import data from MySQL and Oracle to hadoop's hdfs, hive, HBASE and other data storage systems.      Export data: export data from hadoop file system to relation ...

Posted by bouncer on Fri, 26 Nov 2021 14:36:34 +0100

Practice of running Hadoop WordCount program locally

^_^ 1. Configure local hadoop Hadoop 2.7.5 link: https://pan.baidu.com/s/12ef3m0CV21NhjxO7lBH0Eg Extraction code: hhhh Unzip the downloaded hadoop package to disk D for easy search Then right-click the computer and click Properties → click Advanced system settings on the right → click environment variables → select the Path b ...

Posted by hori76 on Fri, 05 Nov 2021 19:53:07 +0100