Hudi of data Lake (10): use Spark to query the data in Hudi

Posted by Jessup on Fri, 25 Feb 2022 06:11:53 +0100

catalogue

0. Links to related articles

1. Environmental preparation

1.1. Build server environment

1.2. Building Maven projects and writing data

2. Maven dependence

3. Core code

3.1. Direct query

3.2. Condition query

0. Links to related articles

Summary of articles on basic knowledge points of big data

1. Environmental preparation

1.1. Build server environment

For building a server environment for Spark to insert data into Hudi, please refer to another blog post. Just install HDFS on CentOS7. The blog link is: Hudi of data Lake (6): integrated installation and use of Hudi with Spark and HDFS

1.2. Building Maven projects and writing data

This blog post demonstrates the use of Spark code to query the data in the existing tables in Hudi. First, build a Maven project and insert some simulation data into Hudi. These can be operated by referring to another blog post of the blogger. The blog link: Hudi of data Lake (9): use Spark to insert data into Hudi

2. Maven dependence

There is Maven dependency in another blog post, but I'd like to add it here

    <repositories>
        <repository>
            <id>aliyun</id>
            <url>http://maven.aliyun.com/nexus/content/groups/public/</url>
        </repository>
        <repository>
            <id>cloudera</id>
            <url>https://repository.cloudera.com/artifactory/cloudera-repos/</url>
        </repository>
        <repository>
            <id>jboss</id>
            <url>http://repository.jboss.com/nexus/content/groups/public</url>
        </repository>
    </repositories>
 
    <properties>
        <scala.version>2.12.10</scala.version>
        <scala.binary.version>2.12</scala.binary.version>
        <spark.version>3.0.0</spark.version>
        <hadoop.version>2.7.3</hadoop.version>
        <hudi.version>0.9.0</hudi.version>
    </properties>
 
    <dependencies>
 
        <!-- rely on Scala language -->
        <dependency>
            <groupId>org.scala-lang</groupId>
            <artifactId>scala-library</artifactId>
            <version>${scala.version}</version>
        </dependency>
 
        <!-- Spark Core rely on -->
        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-core_${scala.binary.version}</artifactId>
            <version>${spark.version}</version>
        </dependency>
        <!-- Spark SQL rely on -->
        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-sql_${scala.binary.version}</artifactId>
            <version>${spark.version}</version>
        </dependency>
 
        <!-- Hadoop Client rely on -->
        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-client</artifactId>
            <version>${hadoop.version}</version>
        </dependency>
 
        <!-- hudi-spark3 -->
        <dependency>
            <groupId>org.apache.hudi</groupId>
            <artifactId>hudi-spark3-bundle_2.12</artifactId>
            <version>${hudi.version}</version>
        </dependency>
        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-avro_2.12</artifactId>
            <version>${spark.version}</version>
        </dependency>
 
    </dependencies>
 
    <build>
        <outputDirectory>target/classes</outputDirectory>
        <testOutputDirectory>target/test-classes</testOutputDirectory>
        <resources>
            <resource>
                <directory>${project.basedir}/src/main/resources</directory>
            </resource>
        </resources>
        <!-- Maven Compiled plug-ins -->
        <plugins>
            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-compiler-plugin</artifactId>
                <version>3.0</version>
                <configuration>
                    <source>1.8</source>
                    <target>1.8</target>
                    <encoding>UTF-8</encoding>
                </configuration>
            </plugin>
            <plugin>
                <groupId>net.alchim31.maven</groupId>
                <artifactId>scala-maven-plugin</artifactId>
                <version>3.2.0</version>
                <executions>
                    <execution>
                        <goals>
                            <goal>compile</goal>
                            <goal>testCompile</goal>
                        </goals>
                    </execution>
                </executions>
            </plugin>
        </plugins>
    </build>

3. Core code

3.1. Direct query

Use Snapshot to query data from Hudi table, write DSL code, and analyze data according to business

package com.ouyang.hudi.crud

import org.apache.hudi.QuickstartUtils.DataGenerator
import org.apache.spark.sql.{DataFrame, SaveMode, SparkSession}

/**
 * @ date: 2022/2/23
 * @ author: yangshibiao
 * @ desc: The data of Snapshot Query is in DSL mode
 */
object Demo02_SnapshotQuery {

    def main(args: Array[String]): Unit = {

        // Create a SparkSession instance object and set properties
        val spark: SparkSession = {
            SparkSession.builder()
                .appName(this.getClass.getSimpleName.stripSuffix("$"))
                .master("local[4]")
                // Set serialization method: Kryo
                .config("spark.serializer", "org.apache.spark.serializer.KryoSerializer")
                .getOrCreate()
        }

        // Define variables: table name, save path
        val tableName: String = "tbl_trips_cow"
        val tablePath: String = "/hudi-warehouse/tbl_trips_cow"

        // Build a data generator to simulate the generation of business data
        import spark.implicits._

        val tripsDF: DataFrame = spark.read.format("hudi").load(tablePath)
        tripsDF.printSchema()
        tripsDF.show(10, truncate = false)

        // The cost is more than 20 and less than 50
        tripsDF
            .filter($"fare" >= 20 && $"fare" <= 50)
            .select($"driver", $"rider", $"fare", $"begin_lat", $"begin_lon", $"partitionpath", $"_hoodie_commit_time")
            .orderBy($"fare".desc, $"_hoodie_commit_time".desc)
            .show(100, truncate = false)
    }
}

Execute the above code and click Run to query all the data under the path and print the data format and part of the data, as shown below:

root
 |-- _hoodie_commit_time: string (nullable = true)
 |-- _hoodie_commit_seqno: string (nullable = true)
 |-- _hoodie_record_key: string (nullable = true)
 |-- _hoodie_partition_path: string (nullable = true)
 |-- _hoodie_file_name: string (nullable = true)
 |-- begin_lat: double (nullable = true)
 |-- begin_lon: double (nullable = true)
 |-- driver: string (nullable = true)
 |-- end_lat: double (nullable = true)
 |-- end_lon: double (nullable = true)
 |-- fare: double (nullable = true)
 |-- rider: string (nullable = true)
 |-- ts: long (nullable = true)
 |-- uuid: string (nullable = true)
 |-- partitionpath: string (nullable = true)

+-------------------+--------------------+------------------------------------+------------------------------------+---------------------------------------------------------------------+--------------------+-------------------+----------+-------------------+-------------------+------------------+---------+-------------+------------------------------------+------------------------------------+
|_hoodie_commit_time|_hoodie_commit_seqno|_hoodie_record_key                  |_hoodie_partition_path              |_hoodie_file_name                                                    |begin_lat           |begin_lon          |driver    |end_lat            |end_lon            |fare              |rider    |ts           |uuid                                |partitionpath                       |
+-------------------+--------------------+------------------------------------+------------------------------------+---------------------------------------------------------------------+--------------------+-------------------+----------+-------------------+-------------------+------------------+---------+-------------+------------------------------------+------------------------------------+
|20220223222328     |20220223222328_1_33 |bd6d99d0-107e-4891-9da6-f243b51323bc|americas/united_states/san_francisco|42e6c711-76e7-4b7c-a6d9-80b1e7aa61a1-0_1-28-29_20220223222328.parquet|0.5655712287397079  |0.8032800489802543 |driver-213|0.18240785532240533|0.869159296395892  |92.0536330577404  |rider-213|1645625676345|bd6d99d0-107e-4891-9da6-f243b51323bc|americas/united_states/san_francisco|
|20220223222328     |20220223222328_1_34 |99bb3a25-669f-4d55-a36f-4ae0b76f76de|americas/united_states/san_francisco|42e6c711-76e7-4b7c-a6d9-80b1e7aa61a1-0_1-28-29_20220223222328.parquet|0.6626987497394154  |0.22504711188369042|driver-213|0.35712946224267583|0.244841817279154  |10.72756362186601 |rider-213|1645326839179|99bb3a25-669f-4d55-a36f-4ae0b76f76de|americas/united_states/san_francisco|
|20220223222328     |20220223222328_1_35 |bd4ae628-3885-4b26-8a50-c14f8e42a265|americas/united_states/san_francisco|42e6c711-76e7-4b7c-a6d9-80b1e7aa61a1-0_1-28-29_20220223222328.parquet|0.11488393157088261 |0.6273212202489661 |driver-213|0.7454678537511295 |0.3954939864908973 |27.79478688582596 |rider-213|1645094601577|bd4ae628-3885-4b26-8a50-c14f8e42a265|americas/united_states/san_francisco|
|20220223222328     |20220223222328_1_36 |59d2ddd0-e836-4443-a816-0ce489c004f2|americas/united_states/san_francisco|42e6c711-76e7-4b7c-a6d9-80b1e7aa61a1-0_1-28-29_20220223222328.parquet|0.5751612868373159  |0.46940431249093517|driver-213|0.6855658616896665 |0.12686440203574556|11.212022663263122|rider-213|1645283606578|59d2ddd0-e836-4443-a816-0ce489c004f2|americas/united_states/san_francisco|
|20220223222328     |20220223222328_1_37 |5d149bc7-78a8-46df-b2b0-a038dc79e378|americas/united_states/san_francisco|42e6c711-76e7-4b7c-a6d9-80b1e7aa61a1-0_1-28-29_20220223222328.parquet|0.1856488085068272  |0.9694586417848392 |driver-213|0.38186367037201974|0.25252652214479043|33.92216483948643 |rider-213|1645133755620|5d149bc7-78a8-46df-b2b0-a038dc79e378|americas/united_states/san_francisco|
|20220223222328     |20220223222328_1_38 |d64b94ec-d8e8-44f3-a5c0-e205e034aa5d|americas/united_states/san_francisco|42e6c711-76e7-4b7c-a6d9-80b1e7aa61a1-0_1-28-29_20220223222328.parquet|0.5731835407930634  |0.4923479652912024 |driver-213|0.08988581780930216|0.42520899698713666|64.27696295884016 |rider-213|1645298902122|d64b94ec-d8e8-44f3-a5c0-e205e034aa5d|americas/united_states/san_francisco|
|20220223222328     |20220223222328_1_39 |f0d208fb-b5aa-4236-acbc-a6ec283c5693|americas/united_states/san_francisco|42e6c711-76e7-4b7c-a6d9-80b1e7aa61a1-0_1-28-29_20220223222328.parquet|0.30057620949299213 |0.3883212395069259 |driver-213|0.8529563766655098 |0.18417876489592633|57.62896261799536 |rider-213|1645483784517|f0d208fb-b5aa-4236-acbc-a6ec283c5693|americas/united_states/san_francisco|
|20220223222328     |20220223222328_1_40 |61602de6-6839-4eb2-88ed-75fdf28bbd1f|americas/united_states/san_francisco|42e6c711-76e7-4b7c-a6d9-80b1e7aa61a1-0_1-28-29_20220223222328.parquet|0.023755167724156978|0.6322099740212305 |driver-213|0.2171902015800108 |0.2132173852420407 |15.330847537835645|rider-213|1645026565110|61602de6-6839-4eb2-88ed-75fdf28bbd1f|americas/united_states/san_francisco|
|20220223222328     |20220223222328_1_41 |6b8c7cdd-0302-4110-bced-a996d56828e8|americas/united_states/san_francisco|42e6c711-76e7-4b7c-a6d9-80b1e7aa61a1-0_1-28-29_20220223222328.parquet|0.5692544178629111  |0.610843492129245  |driver-213|0.366234158145209  |0.2051302267345806 |77.05976291070496 |rider-213|1645519660912|6b8c7cdd-0302-4110-bced-a996d56828e8|americas/united_states/san_francisco|
|20220223222328     |20220223222328_1_42 |3732e4e6-2095-4eb8-903b-8daf3d307607|americas/united_states/san_francisco|42e6c711-76e7-4b7c-a6d9-80b1e7aa61a1-0_1-28-29_20220223222328.parquet|9.544772278234914E-4|0.7150696027624646 |driver-213|0.4142563844059821 |0.1214902298018885 |24.65031205441023 |rider-213|1645112245071|3732e4e6-2095-4eb8-903b-8daf3d307607|americas/united_states/san_francisco|
+-------------------+--------------------+------------------------------------+------------------------------------+---------------------------------------------------------------------+--------------------+-------------------+----------+-------------------+-------------------+------------------+---------+-------------+------------------------------------+------------------------------------+
only showing top 10 rows

You can use DSL syntax to filter and filter the results in Spark. The print results are as follows:

+----------+---------+------------------+--------------------+-------------------+------------------------------------+-------------------+
|driver    |rider    |fare              |begin_lat           |begin_lon          |partitionpath                       |_hoodie_commit_time|
+----------+---------+------------------+--------------------+-------------------+------------------------------------+-------------------+
|driver-213|rider-213|49.899171213436844|0.49054633351061006 |0.8716474406347761 |americas/united_states/san_francisco|20220223222328     |
|driver-213|rider-213|49.57985534250222 |0.13036108279724024 |0.2365242449257826 |americas/brazil/sao_paulo           |20220223222328     |
|driver-213|rider-213|49.121690071563506|0.3880100101379198  |0.8750494376540229 |americas/brazil/sao_paulo           |20220223222328     |
|driver-213|rider-213|46.971815642308016|0.6325393869124881  |0.7723215898397776 |americas/brazil/sao_paulo           |20220223222328     |
|driver-213|rider-213|46.65992353549729 |0.9924142645535157  |0.3157934820865995 |americas/united_states/san_francisco|20220223222328     |
|driver-213|rider-213|44.839244944180244|0.6372504913279929  |0.04241635032425073|americas/brazil/sao_paulo           |20220223222328     |
|driver-213|rider-213|43.4923811219014  |0.6100070562136587  |0.8779402295427752 |americas/brazil/sao_paulo           |20220223222328     |
|driver-213|rider-213|42.76921664939422 |0.20404106962358204 |0.41452263884832685|americas/united_states/san_francisco|20220223222328     |
|driver-213|rider-213|42.46412330377599 |0.8918316400031095  |0.11580010866153201|americas/brazil/sao_paulo           |20220223222328     |
|driver-213|rider-213|41.076686078636236|0.5712378196458244  |0.4559336764388273 |americas/brazil/sao_paulo           |20220223222328     |
|driver-213|rider-213|41.06290929046368 |0.651058505660742   |0.8192868687714224 |asia/india/chennai                  |20220223222328     |
|driver-213|rider-213|40.211140833035394|0.9090538095331541  |0.8801105093619153 |asia/india/chennai                  |20220223222328     |
|driver-213|rider-213|39.31163975206524 |0.7548086309564753  |0.9049457113019617 |asia/india/chennai                  |20220223222328     |
|driver-213|rider-213|38.697902072535484|0.9199515909032545  |0.2895800693712469 |americas/united_states/san_francisco|20220223222328     |
|driver-213|rider-213|38.61457381408665 |0.39253605282983284 |0.5761097193536119 |asia/india/chennai                  |20220223222328     |
|driver-213|rider-213|34.158284716382845|0.4726905879569653  |0.46157858450465483|americas/brazil/sao_paulo           |20220223222328     |
|driver-213|rider-213|33.92216483948643 |0.1856488085068272  |0.9694586417848392 |americas/united_states/san_francisco|20220223222328     |
|driver-213|rider-213|31.32477949501916 |0.7267793086410466  |0.2202009625132143 |americas/brazil/sao_paulo           |20220223222328     |
|driver-213|rider-213|30.80177695413958 |0.3613216010259426  |0.8750683366449247 |asia/india/chennai                  |20220223222328     |
|driver-213|rider-213|30.47844781909017 |0.10509642405359532 |0.07682825311613706|asia/india/chennai                  |20220223222328     |
|driver-213|rider-213|30.24821012722806 |0.6437496229932878  |0.3259549255934986 |americas/brazil/sao_paulo           |20220223222328     |
|driver-213|rider-213|28.874644702723472|0.04316839215753254 |0.49689215534636744|americas/united_states/san_francisco|20220223222328     |
|driver-213|rider-213|28.53709038726113 |0.132849613764075   |0.2370254092732652 |asia/india/chennai                  |20220223222328     |
|driver-213|rider-213|27.911375263393268|0.9461601725825765  |0.07097928915812768|americas/brazil/sao_paulo           |20220223222328     |
|driver-213|rider-213|27.79478688582596 |0.11488393157088261 |0.6273212202489661 |americas/united_states/san_francisco|20220223222328     |
|driver-213|rider-213|27.66236301605771 |0.7527035644196625  |0.7525032121800279 |americas/brazil/sao_paulo           |20220223222328     |
|driver-213|rider-213|25.216729525590676|0.48687190581855855 |0.03482702091010481|americas/united_states/san_francisco|20220223222328     |
|driver-213|rider-213|24.65031205441023 |9.544772278234914E-4|0.7150696027624646 |americas/united_states/san_francisco|20220223222328     |
|driver-213|rider-213|22.991770617403628|0.699025398548803   |0.8105360506582145 |americas/brazil/sao_paulo           |20220223222328     |
|driver-213|rider-213|22.85729206746916 |0.5378950285504629  |0.14011059922351543|americas/brazil/sao_paulo           |20220223222328     |
+----------+---------+------------------+--------------------+-------------------+------------------------------------+-------------------+

3.2. Condition query

When querying Hudi table data, you can filter and query according to time, set the attribute: "as.of.instant", and the format of the value: "20220223222328" or "2022-02-23 22:23:28", which will only obtain qualified data.

The specific code is as follows:

package com.ouyang.hudi.crud

import org.apache.hudi.QuickstartUtils.DataGenerator
import org.apache.spark.sql.{DataFrame, SaveMode, SparkSession}

/**
 * @ date: 2022/2/23
 * @ author: yangshibiao
 * @ desc: The data of Snapshot Query is in DSL mode
 */
object Demo02_SnapshotQuery {

    def main(args: Array[String]): Unit = {

        // Create a SparkSession instance object and set properties
        val spark: SparkSession = {
            SparkSession.builder()
                .appName(this.getClass.getSimpleName.stripSuffix("$"))
                .master("local[4]")
                // Set serialization method: Kryo
                .config("spark.serializer", "org.apache.spark.serializer.KryoSerializer")
                .getOrCreate()
        }

        // Define variables: table name, save path
        val tableName: String = "tbl_trips_cow"
        val tablePath: String = "/hudi-warehouse/tbl_trips_cow"

        import org.apache.spark.sql.functions._

        // Method 1: specify a string and filter the data according to the date and time
        val df1 = spark.read
            .format("hudi")
            .option("as.of.instant", "20220223222328")
            .load(tablePath)
            .sort(col("_hoodie_commit_time").desc)
        df1.printSchema()
        df1.show(numRows = 5, truncate = false)

        println("==================== Split line ====================")

        // Method 2: specify a string and filter the data according to the date and time
        val df2 = spark.read
            .format("hudi")
            .option("as.of.instant", "2022-02-23 22:23:28")
            .load(tablePath)
            .sort(col("_hoodie_commit_time").desc)
        df2.printSchema()
        df2.show(numRows = 5, truncate = false)
    }
}

The print data format and some data are as follows:

root
 |-- _hoodie_commit_time: string (nullable = true)
 |-- _hoodie_commit_seqno: string (nullable = true)
 |-- _hoodie_record_key: string (nullable = true)
 |-- _hoodie_partition_path: string (nullable = true)
 |-- _hoodie_file_name: string (nullable = true)
 |-- begin_lat: double (nullable = true)
 |-- begin_lon: double (nullable = true)
 |-- driver: string (nullable = true)
 |-- end_lat: double (nullable = true)
 |-- end_lon: double (nullable = true)
 |-- fare: double (nullable = true)
 |-- rider: string (nullable = true)
 |-- ts: long (nullable = true)
 |-- uuid: string (nullable = true)
 |-- partitionpath: string (nullable = true)

+-------------------+--------------------+------------------------------------+----------------------+---------------------------------------------------------------------+-------------------+------------------+----------+--------------------+-------------------+-----------------+---------+-------------+------------------------------------+------------------+
|_hoodie_commit_time|_hoodie_commit_seqno|_hoodie_record_key                  |_hoodie_partition_path|_hoodie_file_name                                                    |begin_lat          |begin_lon         |driver    |end_lat             |end_lon            |fare             |rider    |ts           |uuid                                |partitionpath     |
+-------------------+--------------------+------------------------------------+----------------------+---------------------------------------------------------------------+-------------------+------------------+----------+--------------------+-------------------+-----------------+---------+-------------+------------------------------------+------------------+
|20220223222328     |20220223222328_2_43 |c7c3c014-0dc4-42e3-a674-020ffc29a028|asia/india/chennai    |7a997a16-fd0c-48b5-95dd-d50e5216dbab-0_2-28-30_20220223222328.parquet|0.03154543220118411|0.2887009329948117|driver-213|0.7883536904111458  |0.629523587592623  |86.92639065900747|rider-213|1645123906580|c7c3c014-0dc4-42e3-a674-020ffc29a028|asia/india/chennai|
|20220223222328     |20220223222328_2_45 |c59fa19a-b76a-4477-8015-a49615305292|asia/india/chennai    |7a997a16-fd0c-48b5-95dd-d50e5216dbab-0_2-28-30_20220223222328.parquet|0.4805271604136475 |0.8630157667444018|driver-213|0.3272256283194892  |0.6298100777642365 |99.46343958295148|rider-213|1645259758661|c59fa19a-b76a-4477-8015-a49615305292|asia/india/chennai|
|20220223222328     |20220223222328_2_47 |1c73e11f-19f0-48cf-ba76-b79a75af9fd7|asia/india/chennai    |7a997a16-fd0c-48b5-95dd-d50e5216dbab-0_2-28-30_20220223222328.parquet|0.7413486368980094 |0.9417400045187958|driver-213|0.03903494276309427 |0.12892252065489862|5.585015784895486|rider-213|1645511312485|1c73e11f-19f0-48cf-ba76-b79a75af9fd7|asia/india/chennai|
|20220223222328     |20220223222328_2_49 |80e12a32-f802-469a-a072-f92d1ed1ca11|asia/india/chennai    |7a997a16-fd0c-48b5-95dd-d50e5216dbab-0_2-28-30_20220223222328.parquet|0.132849613764075  |0.2370254092732652|driver-213|0.012105237836192995|0.9180654821797201 |28.53709038726113|rider-213|1645556382792|80e12a32-f802-469a-a072-f92d1ed1ca11|asia/india/chennai|
|20220223222328     |20220223222328_2_50 |bb60dcb8-618c-444b-98ad-c22d0a128f33|asia/india/chennai    |7a997a16-fd0c-48b5-95dd-d50e5216dbab-0_2-28-30_20220223222328.parquet|0.770028447157646  |0.730140741480257 |driver-213|0.2776410021076544  |0.02677801967450366|8.123010514625829|rider-213|1645461203317|bb60dcb8-618c-444b-98ad-c22d0a128f33|asia/india/chennai|
+-------------------+--------------------+------------------------------------+----------------------+---------------------------------------------------------------------+-------------------+------------------+----------+--------------------+-------------------+-----------------+---------+-------------+------------------------------------+------------------+
only showing top 5 rows

==================== Split line ====================
root
 |-- _hoodie_commit_time: string (nullable = true)
 |-- _hoodie_commit_seqno: string (nullable = true)
 |-- _hoodie_record_key: string (nullable = true)
 |-- _hoodie_partition_path: string (nullable = true)
 |-- _hoodie_file_name: string (nullable = true)
 |-- begin_lat: double (nullable = true)
 |-- begin_lon: double (nullable = true)
 |-- driver: string (nullable = true)
 |-- end_lat: double (nullable = true)
 |-- end_lon: double (nullable = true)
 |-- fare: double (nullable = true)
 |-- rider: string (nullable = true)
 |-- ts: long (nullable = true)
 |-- uuid: string (nullable = true)
 |-- partitionpath: string (nullable = true)

+-------------------+--------------------+------------------------------------+------------------------------------+---------------------------------------------------------------------+-------------------+-------------------+----------+-------------------+-------------------+------------------+---------+-------------+------------------------------------+------------------------------------+
|_hoodie_commit_time|_hoodie_commit_seqno|_hoodie_record_key                  |_hoodie_partition_path              |_hoodie_file_name                                                    |begin_lat          |begin_lon          |driver    |end_lat            |end_lon            |fare              |rider    |ts           |uuid                                |partitionpath                       |
+-------------------+--------------------+------------------------------------+------------------------------------+---------------------------------------------------------------------+-------------------+-------------------+----------+-------------------+-------------------+------------------+---------+-------------+------------------------------------+------------------------------------+
|20220223222328     |20220223222328_1_33 |bd6d99d0-107e-4891-9da6-f243b51323bc|americas/united_states/san_francisco|42e6c711-76e7-4b7c-a6d9-80b1e7aa61a1-0_1-28-29_20220223222328.parquet|0.5655712287397079 |0.8032800489802543 |driver-213|0.18240785532240533|0.869159296395892  |92.0536330577404  |rider-213|1645625676345|bd6d99d0-107e-4891-9da6-f243b51323bc|americas/united_states/san_francisco|
|20220223222328     |20220223222328_1_34 |99bb3a25-669f-4d55-a36f-4ae0b76f76de|americas/united_states/san_francisco|42e6c711-76e7-4b7c-a6d9-80b1e7aa61a1-0_1-28-29_20220223222328.parquet|0.6626987497394154 |0.22504711188369042|driver-213|0.35712946224267583|0.244841817279154  |10.72756362186601 |rider-213|1645326839179|99bb3a25-669f-4d55-a36f-4ae0b76f76de|americas/united_states/san_francisco|
|20220223222328     |20220223222328_1_35 |bd4ae628-3885-4b26-8a50-c14f8e42a265|americas/united_states/san_francisco|42e6c711-76e7-4b7c-a6d9-80b1e7aa61a1-0_1-28-29_20220223222328.parquet|0.11488393157088261|0.6273212202489661 |driver-213|0.7454678537511295 |0.3954939864908973 |27.79478688582596 |rider-213|1645094601577|bd4ae628-3885-4b26-8a50-c14f8e42a265|americas/united_states/san_francisco|
|20220223222328     |20220223222328_1_36 |59d2ddd0-e836-4443-a816-0ce489c004f2|americas/united_states/san_francisco|42e6c711-76e7-4b7c-a6d9-80b1e7aa61a1-0_1-28-29_20220223222328.parquet|0.5751612868373159 |0.46940431249093517|driver-213|0.6855658616896665 |0.12686440203574556|11.212022663263122|rider-213|1645283606578|59d2ddd0-e836-4443-a816-0ce489c004f2|americas/united_states/san_francisco|
|20220223222328     |20220223222328_1_37 |5d149bc7-78a8-46df-b2b0-a038dc79e378|americas/united_states/san_francisco|42e6c711-76e7-4b7c-a6d9-80b1e7aa61a1-0_1-28-29_20220223222328.parquet|0.1856488085068272 |0.9694586417848392 |driver-213|0.38186367037201974|0.25252652214479043|33.92216483948643 |rider-213|1645133755620|5d149bc7-78a8-46df-b2b0-a038dc79e378|americas/united_states/san_francisco|
+-------------------+--------------------+------------------------------------+------------------------------------+---------------------------------------------------------------------+-------------------+-------------------+----------+-------------------+-------------------+------------------+---------+-------------+------------------------------------+------------------------------------+
only showing top 5 rows

Note: Hudi series of blog posts are written through the learning records of Hudi's official website, in which personal understanding is added. If there are deficiencies, please understand ☺☺☺

Note: other related articles are linked here (all big data related blogs including Hudi) - > Summary of articles on basic knowledge points of big data

Topics: Big Data Spark Hudi