catalogue
1.2. Building Maven projects and writing data
0. Links to related articles
Summary of articles on basic knowledge points of big data
1. Environmental preparation
1.1. Build server environment
For building a server environment for Spark to insert data into Hudi, please refer to another blog post. Just install HDFS on CentOS7. The blog link is: Hudi of data Lake (6): integrated installation and use of Hudi with Spark and HDFS
1.2. Building Maven projects and writing data
This blog post demonstrates the use of Spark code to query the data in the existing tables in Hudi. First, build a Maven project and insert some simulation data into Hudi. These can be operated by referring to another blog post of the blogger. The blog link: Hudi of data Lake (9): use Spark to insert data into Hudi
2. Maven dependence
There is Maven dependency in another blog post, but I'd like to add it here
<repositories> <repository> <id>aliyun</id> <url>http://maven.aliyun.com/nexus/content/groups/public/</url> </repository> <repository> <id>cloudera</id> <url>https://repository.cloudera.com/artifactory/cloudera-repos/</url> </repository> <repository> <id>jboss</id> <url>http://repository.jboss.com/nexus/content/groups/public</url> </repository> </repositories> <properties> <scala.version>2.12.10</scala.version> <scala.binary.version>2.12</scala.binary.version> <spark.version>3.0.0</spark.version> <hadoop.version>2.7.3</hadoop.version> <hudi.version>0.9.0</hudi.version> </properties> <dependencies> <!-- rely on Scala language --> <dependency> <groupId>org.scala-lang</groupId> <artifactId>scala-library</artifactId> <version>${scala.version}</version> </dependency> <!-- Spark Core rely on --> <dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-core_${scala.binary.version}</artifactId> <version>${spark.version}</version> </dependency> <!-- Spark SQL rely on --> <dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-sql_${scala.binary.version}</artifactId> <version>${spark.version}</version> </dependency> <!-- Hadoop Client rely on --> <dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-client</artifactId> <version>${hadoop.version}</version> </dependency> <!-- hudi-spark3 --> <dependency> <groupId>org.apache.hudi</groupId> <artifactId>hudi-spark3-bundle_2.12</artifactId> <version>${hudi.version}</version> </dependency> <dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-avro_2.12</artifactId> <version>${spark.version}</version> </dependency> </dependencies> <build> <outputDirectory>target/classes</outputDirectory> <testOutputDirectory>target/test-classes</testOutputDirectory> <resources> <resource> <directory>${project.basedir}/src/main/resources</directory> </resource> </resources> <!-- Maven Compiled plug-ins --> <plugins> <plugin> <groupId>org.apache.maven.plugins</groupId> <artifactId>maven-compiler-plugin</artifactId> <version>3.0</version> <configuration> <source>1.8</source> <target>1.8</target> <encoding>UTF-8</encoding> </configuration> </plugin> <plugin> <groupId>net.alchim31.maven</groupId> <artifactId>scala-maven-plugin</artifactId> <version>3.2.0</version> <executions> <execution> <goals> <goal>compile</goal> <goal>testCompile</goal> </goals> </execution> </executions> </plugin> </plugins> </build>
3. Core code
3.1. Direct query
Use Snapshot to query data from Hudi table, write DSL code, and analyze data according to business
package com.ouyang.hudi.crud import org.apache.hudi.QuickstartUtils.DataGenerator import org.apache.spark.sql.{DataFrame, SaveMode, SparkSession} /** * @ date: 2022/2/23 * @ author: yangshibiao * @ desc: The data of Snapshot Query is in DSL mode */ object Demo02_SnapshotQuery { def main(args: Array[String]): Unit = { // Create a SparkSession instance object and set properties val spark: SparkSession = { SparkSession.builder() .appName(this.getClass.getSimpleName.stripSuffix("$")) .master("local[4]") // Set serialization method: Kryo .config("spark.serializer", "org.apache.spark.serializer.KryoSerializer") .getOrCreate() } // Define variables: table name, save path val tableName: String = "tbl_trips_cow" val tablePath: String = "/hudi-warehouse/tbl_trips_cow" // Build a data generator to simulate the generation of business data import spark.implicits._ val tripsDF: DataFrame = spark.read.format("hudi").load(tablePath) tripsDF.printSchema() tripsDF.show(10, truncate = false) // The cost is more than 20 and less than 50 tripsDF .filter($"fare" >= 20 && $"fare" <= 50) .select($"driver", $"rider", $"fare", $"begin_lat", $"begin_lon", $"partitionpath", $"_hoodie_commit_time") .orderBy($"fare".desc, $"_hoodie_commit_time".desc) .show(100, truncate = false) } }
Execute the above code and click Run to query all the data under the path and print the data format and part of the data, as shown below:
root |-- _hoodie_commit_time: string (nullable = true) |-- _hoodie_commit_seqno: string (nullable = true) |-- _hoodie_record_key: string (nullable = true) |-- _hoodie_partition_path: string (nullable = true) |-- _hoodie_file_name: string (nullable = true) |-- begin_lat: double (nullable = true) |-- begin_lon: double (nullable = true) |-- driver: string (nullable = true) |-- end_lat: double (nullable = true) |-- end_lon: double (nullable = true) |-- fare: double (nullable = true) |-- rider: string (nullable = true) |-- ts: long (nullable = true) |-- uuid: string (nullable = true) |-- partitionpath: string (nullable = true) +-------------------+--------------------+------------------------------------+------------------------------------+---------------------------------------------------------------------+--------------------+-------------------+----------+-------------------+-------------------+------------------+---------+-------------+------------------------------------+------------------------------------+ |_hoodie_commit_time|_hoodie_commit_seqno|_hoodie_record_key |_hoodie_partition_path |_hoodie_file_name |begin_lat |begin_lon |driver |end_lat |end_lon |fare |rider |ts |uuid |partitionpath | +-------------------+--------------------+------------------------------------+------------------------------------+---------------------------------------------------------------------+--------------------+-------------------+----------+-------------------+-------------------+------------------+---------+-------------+------------------------------------+------------------------------------+ |20220223222328 |20220223222328_1_33 |bd6d99d0-107e-4891-9da6-f243b51323bc|americas/united_states/san_francisco|42e6c711-76e7-4b7c-a6d9-80b1e7aa61a1-0_1-28-29_20220223222328.parquet|0.5655712287397079 |0.8032800489802543 |driver-213|0.18240785532240533|0.869159296395892 |92.0536330577404 |rider-213|1645625676345|bd6d99d0-107e-4891-9da6-f243b51323bc|americas/united_states/san_francisco| |20220223222328 |20220223222328_1_34 |99bb3a25-669f-4d55-a36f-4ae0b76f76de|americas/united_states/san_francisco|42e6c711-76e7-4b7c-a6d9-80b1e7aa61a1-0_1-28-29_20220223222328.parquet|0.6626987497394154 |0.22504711188369042|driver-213|0.35712946224267583|0.244841817279154 |10.72756362186601 |rider-213|1645326839179|99bb3a25-669f-4d55-a36f-4ae0b76f76de|americas/united_states/san_francisco| |20220223222328 |20220223222328_1_35 |bd4ae628-3885-4b26-8a50-c14f8e42a265|americas/united_states/san_francisco|42e6c711-76e7-4b7c-a6d9-80b1e7aa61a1-0_1-28-29_20220223222328.parquet|0.11488393157088261 |0.6273212202489661 |driver-213|0.7454678537511295 |0.3954939864908973 |27.79478688582596 |rider-213|1645094601577|bd4ae628-3885-4b26-8a50-c14f8e42a265|americas/united_states/san_francisco| |20220223222328 |20220223222328_1_36 |59d2ddd0-e836-4443-a816-0ce489c004f2|americas/united_states/san_francisco|42e6c711-76e7-4b7c-a6d9-80b1e7aa61a1-0_1-28-29_20220223222328.parquet|0.5751612868373159 |0.46940431249093517|driver-213|0.6855658616896665 |0.12686440203574556|11.212022663263122|rider-213|1645283606578|59d2ddd0-e836-4443-a816-0ce489c004f2|americas/united_states/san_francisco| |20220223222328 |20220223222328_1_37 |5d149bc7-78a8-46df-b2b0-a038dc79e378|americas/united_states/san_francisco|42e6c711-76e7-4b7c-a6d9-80b1e7aa61a1-0_1-28-29_20220223222328.parquet|0.1856488085068272 |0.9694586417848392 |driver-213|0.38186367037201974|0.25252652214479043|33.92216483948643 |rider-213|1645133755620|5d149bc7-78a8-46df-b2b0-a038dc79e378|americas/united_states/san_francisco| |20220223222328 |20220223222328_1_38 |d64b94ec-d8e8-44f3-a5c0-e205e034aa5d|americas/united_states/san_francisco|42e6c711-76e7-4b7c-a6d9-80b1e7aa61a1-0_1-28-29_20220223222328.parquet|0.5731835407930634 |0.4923479652912024 |driver-213|0.08988581780930216|0.42520899698713666|64.27696295884016 |rider-213|1645298902122|d64b94ec-d8e8-44f3-a5c0-e205e034aa5d|americas/united_states/san_francisco| |20220223222328 |20220223222328_1_39 |f0d208fb-b5aa-4236-acbc-a6ec283c5693|americas/united_states/san_francisco|42e6c711-76e7-4b7c-a6d9-80b1e7aa61a1-0_1-28-29_20220223222328.parquet|0.30057620949299213 |0.3883212395069259 |driver-213|0.8529563766655098 |0.18417876489592633|57.62896261799536 |rider-213|1645483784517|f0d208fb-b5aa-4236-acbc-a6ec283c5693|americas/united_states/san_francisco| |20220223222328 |20220223222328_1_40 |61602de6-6839-4eb2-88ed-75fdf28bbd1f|americas/united_states/san_francisco|42e6c711-76e7-4b7c-a6d9-80b1e7aa61a1-0_1-28-29_20220223222328.parquet|0.023755167724156978|0.6322099740212305 |driver-213|0.2171902015800108 |0.2132173852420407 |15.330847537835645|rider-213|1645026565110|61602de6-6839-4eb2-88ed-75fdf28bbd1f|americas/united_states/san_francisco| |20220223222328 |20220223222328_1_41 |6b8c7cdd-0302-4110-bced-a996d56828e8|americas/united_states/san_francisco|42e6c711-76e7-4b7c-a6d9-80b1e7aa61a1-0_1-28-29_20220223222328.parquet|0.5692544178629111 |0.610843492129245 |driver-213|0.366234158145209 |0.2051302267345806 |77.05976291070496 |rider-213|1645519660912|6b8c7cdd-0302-4110-bced-a996d56828e8|americas/united_states/san_francisco| |20220223222328 |20220223222328_1_42 |3732e4e6-2095-4eb8-903b-8daf3d307607|americas/united_states/san_francisco|42e6c711-76e7-4b7c-a6d9-80b1e7aa61a1-0_1-28-29_20220223222328.parquet|9.544772278234914E-4|0.7150696027624646 |driver-213|0.4142563844059821 |0.1214902298018885 |24.65031205441023 |rider-213|1645112245071|3732e4e6-2095-4eb8-903b-8daf3d307607|americas/united_states/san_francisco| +-------------------+--------------------+------------------------------------+------------------------------------+---------------------------------------------------------------------+--------------------+-------------------+----------+-------------------+-------------------+------------------+---------+-------------+------------------------------------+------------------------------------+ only showing top 10 rows
You can use DSL syntax to filter and filter the results in Spark. The print results are as follows:
+----------+---------+------------------+--------------------+-------------------+------------------------------------+-------------------+ |driver |rider |fare |begin_lat |begin_lon |partitionpath |_hoodie_commit_time| +----------+---------+------------------+--------------------+-------------------+------------------------------------+-------------------+ |driver-213|rider-213|49.899171213436844|0.49054633351061006 |0.8716474406347761 |americas/united_states/san_francisco|20220223222328 | |driver-213|rider-213|49.57985534250222 |0.13036108279724024 |0.2365242449257826 |americas/brazil/sao_paulo |20220223222328 | |driver-213|rider-213|49.121690071563506|0.3880100101379198 |0.8750494376540229 |americas/brazil/sao_paulo |20220223222328 | |driver-213|rider-213|46.971815642308016|0.6325393869124881 |0.7723215898397776 |americas/brazil/sao_paulo |20220223222328 | |driver-213|rider-213|46.65992353549729 |0.9924142645535157 |0.3157934820865995 |americas/united_states/san_francisco|20220223222328 | |driver-213|rider-213|44.839244944180244|0.6372504913279929 |0.04241635032425073|americas/brazil/sao_paulo |20220223222328 | |driver-213|rider-213|43.4923811219014 |0.6100070562136587 |0.8779402295427752 |americas/brazil/sao_paulo |20220223222328 | |driver-213|rider-213|42.76921664939422 |0.20404106962358204 |0.41452263884832685|americas/united_states/san_francisco|20220223222328 | |driver-213|rider-213|42.46412330377599 |0.8918316400031095 |0.11580010866153201|americas/brazil/sao_paulo |20220223222328 | |driver-213|rider-213|41.076686078636236|0.5712378196458244 |0.4559336764388273 |americas/brazil/sao_paulo |20220223222328 | |driver-213|rider-213|41.06290929046368 |0.651058505660742 |0.8192868687714224 |asia/india/chennai |20220223222328 | |driver-213|rider-213|40.211140833035394|0.9090538095331541 |0.8801105093619153 |asia/india/chennai |20220223222328 | |driver-213|rider-213|39.31163975206524 |0.7548086309564753 |0.9049457113019617 |asia/india/chennai |20220223222328 | |driver-213|rider-213|38.697902072535484|0.9199515909032545 |0.2895800693712469 |americas/united_states/san_francisco|20220223222328 | |driver-213|rider-213|38.61457381408665 |0.39253605282983284 |0.5761097193536119 |asia/india/chennai |20220223222328 | |driver-213|rider-213|34.158284716382845|0.4726905879569653 |0.46157858450465483|americas/brazil/sao_paulo |20220223222328 | |driver-213|rider-213|33.92216483948643 |0.1856488085068272 |0.9694586417848392 |americas/united_states/san_francisco|20220223222328 | |driver-213|rider-213|31.32477949501916 |0.7267793086410466 |0.2202009625132143 |americas/brazil/sao_paulo |20220223222328 | |driver-213|rider-213|30.80177695413958 |0.3613216010259426 |0.8750683366449247 |asia/india/chennai |20220223222328 | |driver-213|rider-213|30.47844781909017 |0.10509642405359532 |0.07682825311613706|asia/india/chennai |20220223222328 | |driver-213|rider-213|30.24821012722806 |0.6437496229932878 |0.3259549255934986 |americas/brazil/sao_paulo |20220223222328 | |driver-213|rider-213|28.874644702723472|0.04316839215753254 |0.49689215534636744|americas/united_states/san_francisco|20220223222328 | |driver-213|rider-213|28.53709038726113 |0.132849613764075 |0.2370254092732652 |asia/india/chennai |20220223222328 | |driver-213|rider-213|27.911375263393268|0.9461601725825765 |0.07097928915812768|americas/brazil/sao_paulo |20220223222328 | |driver-213|rider-213|27.79478688582596 |0.11488393157088261 |0.6273212202489661 |americas/united_states/san_francisco|20220223222328 | |driver-213|rider-213|27.66236301605771 |0.7527035644196625 |0.7525032121800279 |americas/brazil/sao_paulo |20220223222328 | |driver-213|rider-213|25.216729525590676|0.48687190581855855 |0.03482702091010481|americas/united_states/san_francisco|20220223222328 | |driver-213|rider-213|24.65031205441023 |9.544772278234914E-4|0.7150696027624646 |americas/united_states/san_francisco|20220223222328 | |driver-213|rider-213|22.991770617403628|0.699025398548803 |0.8105360506582145 |americas/brazil/sao_paulo |20220223222328 | |driver-213|rider-213|22.85729206746916 |0.5378950285504629 |0.14011059922351543|americas/brazil/sao_paulo |20220223222328 | +----------+---------+------------------+--------------------+-------------------+------------------------------------+-------------------+
3.2. Condition query
When querying Hudi table data, you can filter and query according to time, set the attribute: "as.of.instant", and the format of the value: "20220223222328" or "2022-02-23 22:23:28", which will only obtain qualified data.
The specific code is as follows:
package com.ouyang.hudi.crud import org.apache.hudi.QuickstartUtils.DataGenerator import org.apache.spark.sql.{DataFrame, SaveMode, SparkSession} /** * @ date: 2022/2/23 * @ author: yangshibiao * @ desc: The data of Snapshot Query is in DSL mode */ object Demo02_SnapshotQuery { def main(args: Array[String]): Unit = { // Create a SparkSession instance object and set properties val spark: SparkSession = { SparkSession.builder() .appName(this.getClass.getSimpleName.stripSuffix("$")) .master("local[4]") // Set serialization method: Kryo .config("spark.serializer", "org.apache.spark.serializer.KryoSerializer") .getOrCreate() } // Define variables: table name, save path val tableName: String = "tbl_trips_cow" val tablePath: String = "/hudi-warehouse/tbl_trips_cow" import org.apache.spark.sql.functions._ // Method 1: specify a string and filter the data according to the date and time val df1 = spark.read .format("hudi") .option("as.of.instant", "20220223222328") .load(tablePath) .sort(col("_hoodie_commit_time").desc) df1.printSchema() df1.show(numRows = 5, truncate = false) println("==================== Split line ====================") // Method 2: specify a string and filter the data according to the date and time val df2 = spark.read .format("hudi") .option("as.of.instant", "2022-02-23 22:23:28") .load(tablePath) .sort(col("_hoodie_commit_time").desc) df2.printSchema() df2.show(numRows = 5, truncate = false) } }
The print data format and some data are as follows:
root |-- _hoodie_commit_time: string (nullable = true) |-- _hoodie_commit_seqno: string (nullable = true) |-- _hoodie_record_key: string (nullable = true) |-- _hoodie_partition_path: string (nullable = true) |-- _hoodie_file_name: string (nullable = true) |-- begin_lat: double (nullable = true) |-- begin_lon: double (nullable = true) |-- driver: string (nullable = true) |-- end_lat: double (nullable = true) |-- end_lon: double (nullable = true) |-- fare: double (nullable = true) |-- rider: string (nullable = true) |-- ts: long (nullable = true) |-- uuid: string (nullable = true) |-- partitionpath: string (nullable = true) +-------------------+--------------------+------------------------------------+----------------------+---------------------------------------------------------------------+-------------------+------------------+----------+--------------------+-------------------+-----------------+---------+-------------+------------------------------------+------------------+ |_hoodie_commit_time|_hoodie_commit_seqno|_hoodie_record_key |_hoodie_partition_path|_hoodie_file_name |begin_lat |begin_lon |driver |end_lat |end_lon |fare |rider |ts |uuid |partitionpath | +-------------------+--------------------+------------------------------------+----------------------+---------------------------------------------------------------------+-------------------+------------------+----------+--------------------+-------------------+-----------------+---------+-------------+------------------------------------+------------------+ |20220223222328 |20220223222328_2_43 |c7c3c014-0dc4-42e3-a674-020ffc29a028|asia/india/chennai |7a997a16-fd0c-48b5-95dd-d50e5216dbab-0_2-28-30_20220223222328.parquet|0.03154543220118411|0.2887009329948117|driver-213|0.7883536904111458 |0.629523587592623 |86.92639065900747|rider-213|1645123906580|c7c3c014-0dc4-42e3-a674-020ffc29a028|asia/india/chennai| |20220223222328 |20220223222328_2_45 |c59fa19a-b76a-4477-8015-a49615305292|asia/india/chennai |7a997a16-fd0c-48b5-95dd-d50e5216dbab-0_2-28-30_20220223222328.parquet|0.4805271604136475 |0.8630157667444018|driver-213|0.3272256283194892 |0.6298100777642365 |99.46343958295148|rider-213|1645259758661|c59fa19a-b76a-4477-8015-a49615305292|asia/india/chennai| |20220223222328 |20220223222328_2_47 |1c73e11f-19f0-48cf-ba76-b79a75af9fd7|asia/india/chennai |7a997a16-fd0c-48b5-95dd-d50e5216dbab-0_2-28-30_20220223222328.parquet|0.7413486368980094 |0.9417400045187958|driver-213|0.03903494276309427 |0.12892252065489862|5.585015784895486|rider-213|1645511312485|1c73e11f-19f0-48cf-ba76-b79a75af9fd7|asia/india/chennai| |20220223222328 |20220223222328_2_49 |80e12a32-f802-469a-a072-f92d1ed1ca11|asia/india/chennai |7a997a16-fd0c-48b5-95dd-d50e5216dbab-0_2-28-30_20220223222328.parquet|0.132849613764075 |0.2370254092732652|driver-213|0.012105237836192995|0.9180654821797201 |28.53709038726113|rider-213|1645556382792|80e12a32-f802-469a-a072-f92d1ed1ca11|asia/india/chennai| |20220223222328 |20220223222328_2_50 |bb60dcb8-618c-444b-98ad-c22d0a128f33|asia/india/chennai |7a997a16-fd0c-48b5-95dd-d50e5216dbab-0_2-28-30_20220223222328.parquet|0.770028447157646 |0.730140741480257 |driver-213|0.2776410021076544 |0.02677801967450366|8.123010514625829|rider-213|1645461203317|bb60dcb8-618c-444b-98ad-c22d0a128f33|asia/india/chennai| +-------------------+--------------------+------------------------------------+----------------------+---------------------------------------------------------------------+-------------------+------------------+----------+--------------------+-------------------+-----------------+---------+-------------+------------------------------------+------------------+ only showing top 5 rows ==================== Split line ==================== root |-- _hoodie_commit_time: string (nullable = true) |-- _hoodie_commit_seqno: string (nullable = true) |-- _hoodie_record_key: string (nullable = true) |-- _hoodie_partition_path: string (nullable = true) |-- _hoodie_file_name: string (nullable = true) |-- begin_lat: double (nullable = true) |-- begin_lon: double (nullable = true) |-- driver: string (nullable = true) |-- end_lat: double (nullable = true) |-- end_lon: double (nullable = true) |-- fare: double (nullable = true) |-- rider: string (nullable = true) |-- ts: long (nullable = true) |-- uuid: string (nullable = true) |-- partitionpath: string (nullable = true) +-------------------+--------------------+------------------------------------+------------------------------------+---------------------------------------------------------------------+-------------------+-------------------+----------+-------------------+-------------------+------------------+---------+-------------+------------------------------------+------------------------------------+ |_hoodie_commit_time|_hoodie_commit_seqno|_hoodie_record_key |_hoodie_partition_path |_hoodie_file_name |begin_lat |begin_lon |driver |end_lat |end_lon |fare |rider |ts |uuid |partitionpath | +-------------------+--------------------+------------------------------------+------------------------------------+---------------------------------------------------------------------+-------------------+-------------------+----------+-------------------+-------------------+------------------+---------+-------------+------------------------------------+------------------------------------+ |20220223222328 |20220223222328_1_33 |bd6d99d0-107e-4891-9da6-f243b51323bc|americas/united_states/san_francisco|42e6c711-76e7-4b7c-a6d9-80b1e7aa61a1-0_1-28-29_20220223222328.parquet|0.5655712287397079 |0.8032800489802543 |driver-213|0.18240785532240533|0.869159296395892 |92.0536330577404 |rider-213|1645625676345|bd6d99d0-107e-4891-9da6-f243b51323bc|americas/united_states/san_francisco| |20220223222328 |20220223222328_1_34 |99bb3a25-669f-4d55-a36f-4ae0b76f76de|americas/united_states/san_francisco|42e6c711-76e7-4b7c-a6d9-80b1e7aa61a1-0_1-28-29_20220223222328.parquet|0.6626987497394154 |0.22504711188369042|driver-213|0.35712946224267583|0.244841817279154 |10.72756362186601 |rider-213|1645326839179|99bb3a25-669f-4d55-a36f-4ae0b76f76de|americas/united_states/san_francisco| |20220223222328 |20220223222328_1_35 |bd4ae628-3885-4b26-8a50-c14f8e42a265|americas/united_states/san_francisco|42e6c711-76e7-4b7c-a6d9-80b1e7aa61a1-0_1-28-29_20220223222328.parquet|0.11488393157088261|0.6273212202489661 |driver-213|0.7454678537511295 |0.3954939864908973 |27.79478688582596 |rider-213|1645094601577|bd4ae628-3885-4b26-8a50-c14f8e42a265|americas/united_states/san_francisco| |20220223222328 |20220223222328_1_36 |59d2ddd0-e836-4443-a816-0ce489c004f2|americas/united_states/san_francisco|42e6c711-76e7-4b7c-a6d9-80b1e7aa61a1-0_1-28-29_20220223222328.parquet|0.5751612868373159 |0.46940431249093517|driver-213|0.6855658616896665 |0.12686440203574556|11.212022663263122|rider-213|1645283606578|59d2ddd0-e836-4443-a816-0ce489c004f2|americas/united_states/san_francisco| |20220223222328 |20220223222328_1_37 |5d149bc7-78a8-46df-b2b0-a038dc79e378|americas/united_states/san_francisco|42e6c711-76e7-4b7c-a6d9-80b1e7aa61a1-0_1-28-29_20220223222328.parquet|0.1856488085068272 |0.9694586417848392 |driver-213|0.38186367037201974|0.25252652214479043|33.92216483948643 |rider-213|1645133755620|5d149bc7-78a8-46df-b2b0-a038dc79e378|americas/united_states/san_francisco| +-------------------+--------------------+------------------------------------+------------------------------------+---------------------------------------------------------------------+-------------------+-------------------+----------+-------------------+-------------------+------------------+---------+-------------+------------------------------------+------------------------------------+ only showing top 5 rows
Note: Hudi series of blog posts are written through the learning records of Hudi's official website, in which personal understanding is added. If there are deficiencies, please understand ☺☺☺
Note: other related articles are linked here (all big data related blogs including Hudi) - > Summary of articles on basic knowledge points of big data