Understanding its core concepts from hudi persistence files

[overview] This is the first article in the hudi series, which first deepens the understanding of the concept from the core concept and the stored file format, and then gradually shares the use (spark/flink into hudi, hudi synchronous hive, etc.) and principles (compression mechanism, index, clustering, etc.) [what is a d ...

Posted by QbertsBrother on Sun, 20 Feb 2022 05:10:02 +0100

Introduction and practice tutorial of SparkSQL

Absrtact: Spark SQL is a module used to process structured data. Unlike Spark RDD, spark SQL provides better data structure information (source data) and performance. It can interact with spark SQL through SQL and DataSet API. This article is shared from Huawei cloud community< [SparkSQL notes] introduction and practice tutorial of SparkS ...

Posted by mehdi110 on Wed, 26 Jan 2022 19:56:40 +0100