Oozie is an open source framework based on workflow engine contributed by Cloudera company to Apache. It is an open source workflow scheduling engine of Hadoop platform, which is used to manage Hadoop jobs. This article, the first in a series, introduces oozie's task submission phase.
We deduce the implementation f ...
Posted by cowboy_x on Tue, 30 Jun 2020 05:26:29 +0200
A tool for translating sql statements into mapreduce programs.
Create table statement
CREATE TABLE page_view(viewTime INT, userid BIGINT,
page_url STRING, referrer_url STRING,
ip STRING COMMENT 'IP Address of the User')
COMMENT 'This is the page view table'
PARTITIONED BY(dt STRING, country STRING)
ROW FORMAT DELIMI ...
Posted by mainewoods on Wed, 29 Apr 2020 14:54:07 +0200
Reference article: apache Impala detailed installation (lying in the most complete pit)
Apache impala detailed installation
impala is an efficient sql query tool provided by cloudera, which provides real-time query results. The official test performance is 10 to 100 times faster than hive, and its sql query is even faster than spark sql. imp ...
Posted by deth4uall on Tue, 21 Apr 2020 09:18:04 +0200
hadoop version is 2.8.3
Today, I found a strange problem, as shown in List-1 below, indicating that two file blocks are missing
There are 2 missing blocks. The following files may be corrupted:
blk_1073857295 /tmp/xxx/b9a11fe8-306a-42cc-b49f-2a7f0 ...
Posted by tomfmason on Wed, 25 Mar 2020 15:31:06 +0100
1. because hostname cannot be resolved
1. Cannot find hadoop installation: $HADOOP_HOME or $HADOOP_PREFIX must be set or hadoop must be in the path
2. Unable to instantiate org.a ...
Posted by a6000000 on Tue, 17 Mar 2020 02:31:41 +0100
I have a Live Android application. I have received the following stack trace information from the market. I don't know why it happened in the application code instead of happening, but caused by some or other events in the application (assumed)
I don't use Fragments, but I still have a reference to the fragment manager. If someone can under ...
Posted by weemee500 on Sun, 01 Mar 2020 05:05:55 +0100
1. Introduction to sqoop:
Sqoop is an open source tool, mainly used in Hadoop(Hive) and traditional databases (mysql, postgresql )
Data can be transferred from one relational database (such as mysql, Oracle, Postgres, etc.) to another
Data can be imported into HDFS of Hadoop or into relational database. ...
Posted by wolfrock on Wed, 26 Feb 2020 07:30:01 +0100
4, DDL data definition
4.1 create database
4.2 query database
4.3 modify database
4.4 delete database
4.5 create table
4.6 zoning table
4.7 modification table
5, DML data operation
5.1 data import
5.2 data export
5.3 clear data in the table (Truncate)
6.1 basic query ...
Posted by PyroX on Sun, 23 Feb 2020 09:38:46 +0100
Chapter 3 enterprise development cases
3.1 official case of monitoring port data
Case requirements: first, Flume monitors port 44444 of the machine, then sends a message to port 44444 of the machine through telnet tool, and finally Flume displays the monitored data on the console in real time.
Posted by poring on Thu, 13 Feb 2020 21:26:22 +0100
Let's talk about the last step of preparing to upgrade presto, because the pre deployed Presto machine itself deployed a CDH cluster (customer's home), and the machine was Centos6.x. according to the JDK on the Cloudrea official website, it can't move. Later, it was found that his JDK environment variab ...
Posted by B of W on Tue, 11 Feb 2020 11:48:16 +0100