1. Introduction to confluent platform
- The Confluent Platform is a comprehensive event flow platform that enables you to easily access, store and manage data in the form of continuous real-time streams. Confluent by Apache Kafka ® Built by the original creator of Kafka, it expands Kafka's advantages through enterprise level functions and eliminates the burden of Kafka management or monitoring
- By integrating historical and real-time data into a single, central source of facts, fluent can easily build a new category of modern event driven applications, obtain a common data pipeline, and unlock powerful new use cases, performance and reliability with full scalability
- The Confluent Platform allows you to focus on how to get business value from data without worrying about underlying mechanisms, such as how data is transferred or integrated between different systems. Specifically, the fluent platform simplifies the process of connecting data sources to Kafka, building streaming applications, and protecting, monitoring, and managing your Kafka infrastructure
- The core of fluent platform is Apache Kafka, which is the most popular open source distributed streaming media platform
function
- Connect to other database systems: provide a fusion connector with Kafka, and use Kafka Connect API to connect Kafka to other systems, such as database, key value storage, search index and file system.
- It can be used for ETL and caching view data (pre calculating query results to speed up data reading): ksqlDB component is Kafka's streaming SQL Engine. It provides an easy-to-use but powerful interactive SQL interface for streaming processing on Kafka without writing code in programming languages such as Java or Python. ksqlDB has scalability, elasticity, fault tolerance and real-time. It supports a wide range of streaming operations, including data filtering, transformation, aggregation, connection, windowing and conversational.
2. Rapid deployment: quick start
platform quickstart: https://docs.confluent.io/platform/current/quickstart/ce-quickstart.html
category | command |
---|---|
Start service | confluent local services start |
Out of Service | confluent local services stop |
Delete services and data | confluent local destroy |
a. Unzip the installation and start the service
#1. Unzip and configure environment variables [root@c7-docker confluent-6.1.1]# export CONFLUENT_HOME=/opt/confluent-6.1.1/ [root@c7-docker confluent-6.1.1]# echo 'export CONFLUENT_HOME=/opt/confluent-6.1.1/' >> /etc/profile [root@c7-docker confluent-6.1.1]# export PATH=$PATH:$CONFLUENT_HOME/bin [root@c7-docker confluent-6.1.1]# echo 'export PATH=$PATH:$CONFLUENT_HOME/bin' >> /etc/profile #2. Install kafka connector kafka connect datagen #connector doc: https://docs.confluent.io/home/connect/overview.html #Plug in installation directory: / opt / concurrent-6.1.1/share/concurrent-hub-components [root@c7-docker confluent-6.1.1]# grep 'plugin.path' /opt/confluent-6.1.1/etc/ -r /opt/confluent-6.1.1/etc/kafka/connect-standalone.properties:# plugin.path=/usr/local/share/java,/usr/local/share/kafka/plugins,/opt/connectors, /opt/confluent-6.1.1/etc/kafka/connect-standalone.properties:plugin.path=/usr/share/java,/opt/confluent-6.1.1/share/confluent-hub-components /opt/confluent-6.1.1/etc/kafka/connect-distributed.properties:# plugin.path=/usr/local/share/java,/usr/local/share/kafka/plugins,/opt/connectors, /opt/confluent-6.1.1/etc/kafka/connect-distributed.properties:plugin.path=/usr/share/java,/opt/confluent-6.1.1/share/confluent-hub-components /opt/confluent-6.1.1/etc/ksqldb/connect.properties:# plugin.path= /opt/confluent-6.1.1/etc/schema-registry/connect-avro-standalone.properties:# plugin.path=/usr/local/share/java,/usr/local/share/kafka/plugins,/opt/connectors, /opt/confluent-6.1.1/etc/schema-registry/connect-avro-standalone.properties:plugin.path=share/java,/opt/confluent-6.1.1/share/confluent-hub-components /opt/confluent-6.1.1/etc/schema-registry/connect-avro-distributed.properties:# plugin.path=/usr/local/share/java,/usr/local/share/kafka/plugins,/opt/connectors, /opt/confluent-6.1.1/etc/schema-registry/connect-avro-distributed.properties:plugin.path=share/java,/opt/confluent-6.1.1/share/confluent-hub-components /opt/confluent-6.1.1/etc/kafka-connect-replicator/replicator-connect-distributed.properties:#plugin.path=/usr/local/share/java,/usr/local/share/kafka/plugins,/opt/connectors /opt/confluent-6.1.1/etc/kafka-connect-replicator/replicator-connect-standalone.properties:#plugin.path=/usr/local/share/java,/usr/local/share/kafka/plugins,/opt/connectors #Local compilation and installation: git clone https://github.com/confluentinc/kafka-connect-datagen.git # git checkout v0.4.0 # mvn clean package # confluent-hub install target/components/packages/confluentinc-kafka-connect-datagen-0.4.0.zip ### Usage: confluent-hub install : install a component from either Confluent Hub or from a local file [root@c7-docker confluent-6.1.1]# confluent-hub install --no-prompt confluentinc/kafka-connect-datagen:latest Running in a "--no-prompt" mode Implicit acceptance of the license below: Apache License 2.0 https://www.apache.org/licenses/LICENSE-2.0 Downloading component Kafka Connect Datagen 0.5.0, provided by Confluent, Inc. from Confluent Hub and installing into /opt/confluent-6.1.1//share/confluent-hub-components Adding installation directory to plugin path in the following files: /opt/confluent-6.1.1//etc/kafka/connect-distributed.properties /opt/confluent-6.1.1//etc/kafka/connect-standalone.properties /opt/confluent-6.1.1//etc/schema-registry/connect-avro-distributed.properties /opt/confluent-6.1.1//etc/schema-registry/connect-avro-standalone.properties Completed #3. Modify the configuration file (the default connection address of ksqlDB is localhost:8088) to prevent remote connection http://192.168.56.7:9021/ (the query sql will report an error) [root@c7-docker confluent-6.1.1]# cd etc/ [root@c7-docker etc]# ls cli confluent-control-center-fe confluent-metadata-service kafka ksqldb confluent-common confluent-hub-client confluent-rebalancer kafka-connect-replicator rest-utils confluent-control-center confluent-kafka-mqtt confluent-security kafka-rest schema-registry [root@c7-docker etc]# grep ':8088' * -r confluent-control-center/control-center.properties:#confluent.controlcenter.ksql.ksqlDB.url=http://localhost:8088 confluent-control-center/control-center.properties:confluent.controlcenter.ksql.ksqlDB.url=http://192.168.56.117:8088 confluent-control-center/control-center-minimal.properties:#confluent.controlcenter.ksql.ksqlDB.url=http://localhost:8088 confluent-control-center/control-center-minimal.properties:confluent.controlcenter.ksql.ksqlDB.url=http://192.168.56.117:8088 confluent-control-center/control-center-dev.properties:#confluent.controlcenter.ksql.ksqlDB.url=http://localhost:8088 confluent-control-center/control-center-dev.properties:confluent.controlcenter.ksql.ksqlDB.url=http://192.168.56.117:8088 confluent-control-center/control-center-production.properties:#confluent.controlcenter.ksql.ksqlDB.url=http://ksql:8088 confluent-control-center/control-center-production.properties:confluent.controlcenter.ksql.ksqlDB.url=http://192.168.56.117:8088 ksqldb/ksql-server.properties:#listeners=http://0.0.0.0:8088 ksqldb/ksql-server.properties:listeners=http://192.168.56.117:8088 ksqldb/ksql-server.properties:# listeners=http://[::]:8088 ksqldb/ksql-server.properties:# listeners=https://0.0.0.0:8088 ksqldb/ksql-production-server.properties:#listeners=http://0.0.0.0:8088 ksqldb/ksql-production-server.properties:listeners=http://192.168.56.117:8088 ksqldb/ksql-production-server.properties:# listeners=http://[::]:8088 ksqldb/ksql-production-server.properties:# listeners=https://0.0.0.0:8088 #4. Start the service and view the log [root@c7-docker etc]# confluent local services start The local commands are intended for a single-node development environment only, NOT for production usage. https://docs.confluent.io/current/cli/index.html Using CONFLUENT_CURRENT: /tmp/confluent.007829 Starting ZooKeeper ZooKeeper is [UP] Starting Kafka Kafka is [UP] Starting Schema Registry Schema Registry is [UP] Starting Kafka REST Kafka REST is [UP] Starting Connect Connect is [UP] Starting ksqlDB Server ksqlDB Server is [UP] Starting Control Center Control Center is [UP] [root@c7-docker etc]# ls /tmp/confluent.007829 connect control-center kafka kafka-rest ksql-server schema-registry zookeeper #Data file, log file: [root@c7-docker lib]# ls /tmp/confluent.007829/ connect control-center kafka kafka-rest ksql-server schema-registry zookeeper [root@c7-docker lib]# ls /tmp/confluent.007829/connect/ connect.properties connect.stdout data logs
b. Control center paging operation: create topic and generate test data
- visit http://xxx:9021 Page operation
- Create topic: pageviews, users
- Install kafka connect datagen and generate test data
#a. Generate test data for pageviews topic in AVRO format: { "name": "datagen-pageviews", "connector.class": "io.confluent.kafka.connect.datagen.DatagenConnector", "key.converter": "org.apache.kafka.connect.storage.StringConverter", "kafka.topic": "pageviews", "max.interval": "100", "quickstart": "pageviews" } #b. Generate test data for users topic in AVRO format: { "name": "datagen-users", "connector.class": "io.confluent.kafka.connect.datagen.DatagenConnector", "key.converter": "org.apache.kafka.connect.storage.StringConverter", "kafka.topic": "users", "max.interval": "1000", "quickstart": "users" }
c. Using ksqlDB: view data / create table or stream
ksql doc : https://docs.ksqldb.io/en/latest/concepts/streams/
- Paging operation ksqlDB is equivalent to command line execution: ksql http://localhost:8088
######################### Create a table/stream using ksqlDB: #1. Create stream (Kafka topic can be created automatically) # ksql> CREATE STREAM riderLocations (profileId VARCHAR, latitude DOUBLE, longitude DOUBLE) # WITH (kafka_topic='locations', value_format='json', partitions=1); #Parameter Description: #kafka_topic - # Name of the Kafka topic underlying the stream. In this case it will be automatically created because it doesn't exist yet, but streams may also be created over topics that already exist. #value_format - # Encoding of the messages stored in the Kafka topic. For JSON encoding, each row will be stored as a JSON object whose keys/values are column names/values. #partitions - # Number of partitions to create for the locations topic. Note that this parameter is not needed for topics that already exist. #2. Run push query over the stream: coordinates are within 5 miles # SELECT * FROM riderLocations WHERE GEO_DISTANCE(latitude, longitude, 37.4133, -122.1162) <= 5 EMIT CHANGES; # Then open a new session, insert data and observe whether the data is queried in real time # INSERT INTO riderLocations (profileId, latitude, longitude) VALUES ('c2309eec', 37.7877, -122.4205); # INSERT INTO riderLocations (profileId, latitude, longitude) VALUES ('18f4ea86', 37.3903, -122.0643); # ksql> create stream users( rowkey int key, username varchar) with( KAFKA_TOPIC='users',VALUE_FORMAT='JSON'); # Message #---------------- # Stream created #---------------- #ksql> insert into users(username) values('a'); #ksql> insert into users(username) values('b'); #ksql> select 'hello,'+ username as greeting from users emit changes; #+----------------------------------------------------------------------------------------------------------------------------------------------------------+ #|GREETING | #+----------------------------------------------------------------------------------------------------------------------------------------------------------+ #|hello,b | #|hello,a | # a stream for the pageviews topic : ksql> CREATE STREAM pageviews WITH (KAFKA_TOPIC='pageviews', VALUE_FORMAT='AVRO'); # a table for the users topic: ksql> CREATE TABLE users (id VARCHAR PRIMARY KEY) WITH (KAFKA_TOPIC='users', VALUE_FORMAT='AVRO'); ksql> set 'auto.offset.reset'='earliest'; Successfully changed local property 'auto.offset.reset' to 'earliest'. Use the UNSET command to revert your change. ksql> show topics; Kafka Topic | Partitions | Partition Replicas --------------------------------------------------------------- pageviews | 1 | 1 users | 1 | 1 --------------------------------------------------------------- ######################### 1. Non persistent query ksql> SELECT * FROM pageviews EMIT CHANGES LIMIT 1; +--------------------------------------------------+--------------------------------------------------+--------------------------------------------------+ |VIEWTIME |USERID |PAGEID | +--------------------------------------------------+--------------------------------------------------+--------------------------------------------------+ |129321 |User_1 |Page_40 | Limit Reached Query terminated ksql> SELECT * from users EMIT CHANGES LIMIT 1; +-----------------------------+-----------------------------+-----------------------------+-----------------------------+-----------------------------+ |ID |REGISTERTIME |USERID |REGIONID |GENDER | +-----------------------------+-----------------------------+-----------------------------+-----------------------------+-----------------------------+ |User_5 |1489800608800 |User_5 |Region_5 |MALE | Limit Reached Query terminated ######################### 2. Persistent query: persistent query (as a stream): filter the female users in pageviews stream and save the query results to pageviews_female topic ksql> SELECT users.id AS userid, pageid, regionid FROM pageviews LEFT JOIN users ON pageviews.userid = users.id WHERE gender = 'FEMALE' EMIT CHANGES LIMIT 1; +--------------------------------------------------+--------------------------------------------------+--------------------------------------------------+ |USERID |PAGEID |REGIONID | +--------------------------------------------------+--------------------------------------------------+--------------------------------------------------+ |User_3 |Page_89 |Region_7 | Limit Reached Query terminated ksql> CREATE STREAM pageviews_female AS SELECT users.id AS userid, pageid, regionid FROM pageviews LEFT JOIN users ON pageviews.userid = users.id WHERE gender = 'FEMALE' EMIT CHANGES; ksql> SELECT * from pageviews_female EMIT CHANGES LIMIT 1; +--------------------------------------------------+--------------------------------------------------+--------------------------------------------------+ |USERID |PAGEID |REGIONID | +--------------------------------------------------+--------------------------------------------------+--------------------------------------------------+ |User_8 |Page_97 |Region_4 | Limit Reached Query terminated ######################### 3. persistent query: filter those whose regionid ends in 8 or 9, and save the query results to pageviews_enriched_r8_r9 topic ksql> SELECT * FROM pageviews_female WHERE regionid LIKE '%_8' OR regionid LIKE '%_9' EMIT CHANGES LIMIT 1; +--------------------------------------------------+--------------------------------------------------+--------------------------------------------------+ |USERID |PAGEID |REGIONID | +--------------------------------------------------+--------------------------------------------------+--------------------------------------------------+ |User_9 |Page_95 |Region_8 | Limit Reached Query terminated ksql> CREATE STREAM pageviews_female_like_89 WITH (KAFKA_TOPIC='pageviews_enriched_r8_r9', VALUE_FORMAT='AVRO') AS SELECT * FROM pageviews_female WHERE regionid LIKE '%_8' OR regionid LIKE '%_9' EMIT CHANGES; ksql> SELECT * from pageviews_female_like_89 EMIT CHANGES LIMIT 1; +--------------------------------------------------+--------------------------------------------------+--------------------------------------------------+ |USERID |PAGEID |REGIONID | +--------------------------------------------------+--------------------------------------------------+--------------------------------------------------+ |User_8 |Page_67 |Region_8 | Limit Reached Query terminated ######################### 4. Persistent query: persistent query: count each REGIONID and gender in pageviews (stream) (30s is a window), and count > 1. Save the result as table (topic is pageviews_regions) ksql> SELECT gender, regionid, COUNT(*) AS numusers FROM pageviews LEFT JOIN users ON pageviews.userid = users.id WINDOW TUMBLING (SIZE 30 SECOND) GROUP BY gender, regionid HAVING COUNT(*) > 1 EMIT CHANGES LIMIT 1; +--------------------------------------------------+--------------------------------------------------+--------------------------------------------------+ |GENDER |REGIONID |NUMUSERS | +--------------------------------------------------+--------------------------------------------------+--------------------------------------------------+ |FEMALE |Region_6 |2 | Limit Reached Query terminated ksql> CREATE TABLE pageviews_regions WITH (KEY_FORMAT='JSON') AS SELECT gender, regionid, COUNT(*) AS numusers FROM pageviews LEFT JOIN users ON pageviews.userid = users.id WINDOW TUMBLING (SIZE 30 SECOND) GROUP BY gender, regionid HAVING COUNT(*) > 1 EMIT CHANGES; ksql> SELECT * from pageviews_regions EMIT CHANGES LIMIT 1; +-------------------------------------+-------------------------------------+-------------------------------------+-------------------------------------+ |KSQL_COL_0 |WINDOWSTART |WINDOWEND |NUMUSERS | +-------------------------------------+-------------------------------------+-------------------------------------+-------------------------------------+ |OTHER|+|Region_3 |1623913830000 |1623913860000 |3 | Limit Reached Query terminated