Syntax comparison between Presto and Hive

Posted by Lord Brar on Tue, 25 Jan 2022 23:31:57 +0100

Some time conversion problems are often encountered in work:

1) log_date:20200110 needs to be converted to standard date or compared with timestamp data

2) The working environment involves Presto and hive. It is faster to check the query with presto. Therefore, it is generally necessary to convert the date with the syntax of Presto and hive at the same time

The time conversion used recently is sorted out below

  • Question 1: time format conversion (time format)

Example: the current time 20200110 is converted to 2020-01-10

--Output 2020-01-10
--hive
select to_date(from_unixtime(UNIX_TIMESTAMP('20200110','yyyyMMdd')));

--presto
select (format_datetime(date_parse('20200110','%Y%m%d'),'yyyy-MM-dd') ;

Conversion between hive string and time in various formats: [to] String to Date conversion in hive - convert various strings into date format in Hive  http://www.mamicode.com/info-detail-2587004.html

  • Question 2: addition and subtraction of time

Example: the original time is 20200110, which needs to be converted into standard date form before addition and subtraction

--hive 
select date_add('2020-01-12',10);
select date_add(to_date(from_unixtime(UNIX_TIMESTAMP('20200110','yyyyMMdd'))),10);

--presto
select date_add('day',-6,cast('2020-07-07' as date)); 
--The third parameter is not converted to date format, Error will be reported. The third parameter must be date format
select 
date_add('day',-6,cast(format_datetime(date_parse('20200110','%Y%m%d'),'yyyy-MM-dd') as date));

Date in hive_ add, date_ sub, date_ Usage of diff: datediff, date in HIve_ Add and date_ Usage of sub

date_sub and date_ The usage of add is almost the same

  • Question 3: timestamp to date
--hive
select from_unixtime(1578585600);
--Add format
select from_unixtime(1578585600,'yyyyMMdd');

--presto
select from_unixtime(1578585600);
--Add format
select format_datetime(from_unixtime(1578585600),'yyyy-MM-dd');
  1. Question 4: date to timestamp
--hive 
select unix_timestamp('20200110' ,'yyyyMMdd'); --10 Bit timestamp

-- presto 
select to_unixtime(cast('2020-01-10' as date));
select to_unixtime(cast(format_datetime(date_parse('20200110','%Y%m%d'),'yyyy-MM-dd') as d
  • Question 5: calculate the diff between two dates
--hive
select datediff('2017-09-15','2017-09-01');-- Result 14

--presto 
select date_diff('day',cast('2018-09-05' as date),cast('2018-09-07' as date));

-- 1)Parameters are required'day',Indicates the interval of days to query; To query hours, provide parameters'hour'
-- 2)And the subsequent parameters are limited to date Type;
-- 3)The last thing to note is that the back minus the front --And hive Different

json string extraction, row to column, bit operation

  1. json string extraction
--hive
select get_json_object(json, '$.book');
--Presto
select json_extract_scalar(json, '$.book');
--Pay attention here Presto in json_extract_scalar The return value is a string type,It also has a function json_extract Is to return one directly json String, so when using, you have to know what type of value you are taking.

2. Column to row

There are two fields, user_ ID, scores (respectively the user's score)

Example: yn 98,97,95

If we want to convert it to user_id, score (a score), column to row conversion is required, and the result is

yn 98

yn 97

yn 95

The corresponding sql is written as

--hive
select student, score from tests lateral view explode(split(scores, ',')) t as score;

--presto
select student, score from tests cross json unnest(split(scores, ',') as t (score);

3. Bit operation

If you want to judge whether a secondary system bit of a number is 1, you need to use in place operation

Example: judge whether the sixth bit from the right of the binary representation of a number is 1

Note that if bit 6 is 1, the result is 64

--hive
select 8 & 64 != 64; -- 0,
select 64 & 64 != 64; --64,
--Presto
select bitwise_and(64,64); --result: 64
select bitwise_and(2,64); --result: 0

presto other bit operation correlation functions
bit_count(x, bits) → bigint returns the digit of 1 in the complement of X
bitwise_and(x, y) → bigint bit and function
bitwise_not(x) → bigint operation
bitwise_or(x, y) → bigint bit or function
bitwise_xor(x, y) → bigint or function
bitwise_and_agg(x) → bigint returns the and operation results of all values in X, and X is an array
bitwise_or_agg(x) → bigint returns the or operation results of all values in X, an x-bit array

 

Topics: hive SQL