[study manual] Apache pulsar operation manual

Posted by exploo on Wed, 09 Feb 2022 12:35:32 +0100

1. Apache pulsar installation and deployment

1.1 preliminary preparation

  • zookeeper 3.4.5
  • pulsar installation package 2.8.1
  • Cluster security free environment

1.2 deployment steps

1.2.1 upload the installation package to the linux server

Download address: https://pulsar.apache.org/zh-CN/download/

1.2.2. Unzip the file to the data directory

tar -zxvf apache-pulsar-2.8.1-bin.tar.gz  -C /data/

1.2.3. Initialize cluster metadata information

Execute on risen-cdh01

bin/pulsar initialize-cluster-metadata \
  --cluster pulsar-cluster \
  --zookeeper risen-cdh01:2181  \
  --configuration-store risen-cdh01:2181  \
  --web-service-url http://risen-cdh01:8089 \
  --web-service-url-tls https://risen-cdh01:8443 \
  --broker-service-url pulsar://risen-cdh01:6650 \
  --broker-service-url-tls pulsar+ssl://risen-cdh01:6651

Successful execution

10:36:09.876 [main] INFO org.apache.bookkeeper.discover.ZKRegistrationManager - Successfully formatted BookKeeper metadata
10:36:09.880 [main] INFO org.apache.zookeeper.ZooKeeper - Session: 0x16734464b360002 closed
10:36:09.880 [main-EventThread] INFO org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0x16734464b360002
10:36:10.033 [main] INFO org.apache.pulsar.PulsarClusterMetadataSetup - Cluster metadata for 'pulsar-cluster-1' setup correctly

If the execution fails, enter zkclient. Delete relevant files

[zookeeper, counters, bookies, ledgers, managed-ledgers, schemas, namespace, admin, loadbalance]

1.2.4. Modify Bookkeeper configuration file

vim conf/bookkeeper.conf

Amend the following:


**ps: * * port modification can be customized, but it cannot conflict with existing ports

1.2.5. Modify brokers configuration file

vim  conf/broker.conf

Amend the following:


Modify all ports 801.2 and 806.2 under the directory

Because the 8080 port is too commonly used, it is easy to be occupied

Adjust here to 8089

1.2.7 distribute the modified files to several other servers

scp -r apache-pulsar-2.8.1/ risen-cdh02:$PWD
scp -r apache-pulsar-2.8.1/ risen-cdh03:$PWD

1.2.8. Install BookKeeper cluster

Execute on three machines respectively

bin/pulsar-daemon start bookie

bin/pulsar-daemon stop bookie

After execution, use the following command to see if the startup is successful

bin/bookkeeper shell bookiesanity

1.2.9. Installing brokers cluster

Execute on three machines respectively

bin/pulsar-daemon start broker

bin/pulsar-daemon stop broker

Then execute on risen-cdh01

bin/pulsar-admin brokers list pulsar-cluster

If no error is reported, the startup is successful

2. Pulsar Manager installation and deployment

2.1 preliminary preparation

  • pulsar cluster installation completed
  • The server has docker installed

2.2 installation steps

2.2.1. docker pulls the latest environment

docker pull apachepulsar/pulsar-manager:latest

2.2.2 operation

docker run -dit \
    -p 9527:9527 -p 7750:7750 \
    -e SPRING_CONFIGURATION_FILE=/pulsar-manager/pulsar-manager/application.properties \

2.2.3. Create an account

CSRF_TOKEN=$(curl http://risen-cdh01:7750/pulsar-manager/csrf-token)
curl \
    -H "Cookie: XSRF-TOKEN=$CSRF_TOKEN;" \
    -H 'Content-Type: application/json' \
    -X PUT http://risen-cdh01:7750/pulsar-manager/users/superuser \
    -d '{"name": "admin", "password": "apachepulsar", "description": "test", "email": "username@test.org"}'

2.2.4. Query the cluster

**Pulsar admin api called by pulsar manager, which needs to get information from the broker, so you need to specify the broker url for pulsar admin to get information**

bin/pulsar-admin clusters list

2.2.5. Specify cluster

bin/pulsar-admin clusters update pulsar-cluster --url

2.2.6 login query

visit http://risen-cdh01:9527

Log in to the account and password just set in 2.2.3

Installation completed!

3. Introduction to Pulsar concept

3.1 functions and characteristics

3.1.1 multi tenant

Aim to isolate resources and configure different resources for each user. User A can only operate 20% of resources, and user B can operate 30% of resources (tenants are used in conjunction with namespace operations)

Tenant and namespace are pulsar Two core concepts supporting multi tenancy.
At the tenant level, pulsar Reserve appropriate storage space, application authorization and authentication mechanism for specific tenants
 At the namespace level, pulsar There are a series of configuration strategies. Including quota, flow control, message expiration policy and isolation policy between namespaces

3.1.2 flexible message system

  • For the unification of queue model and flow model, only one piece of data needs to be saved at the Topic level, and the same piece of data can be consumed multiple times. Computing different subscription models by streaming and queuing greatly improves flexibility
  • At the same time, exact once is adopted through transactions. In the process of message transmission, it can ensure that data is not lost or repeated
  • The flow model can be carried out with pulsar function, stream ETL from several topics, and then write it to another topic

3.1.3 cloud native architecture

  • Cloud native architecture with separation of computing and storage. The data is moved away from the broker, and there is an internal bookkeeper for shared storage
  • The upper layer broker is stateless and is responsible for data distribution and service
  • The lower layer is the persistent storage layer Bookie.
  • pulsar storage is segmented to avoid being limited during capacity expansion and realize independent expansion and rapid recovery of data

3.1.4 segmented streams

  • The unbounded data is viewed as the flow of component pieces, which are stored in hierarchical storage Bookkeeper cluster and broker nodes.

3.1.5 support cross region replication

  • It can realize disaster recovery across cluster s and regions

3.2 components provided by Pulsar

3.2.1 hierarchical storage

  • bookkeeper storage. When there is too much data, the reading efficiency decreases. Some data can be put in other places (unloading fragmentation), such as hdfs or others

3.2.2 Pulsar IO (Connector) Connector

  • The main purpose is to integrate pulsar with other surrounding software.
  • There are two components, source and sink
  • For example, HDFS, spark, Flink, Flume, ES, HBase

3.2.3. Pulsar Functions

  • Provide users with a FASS platform with simple deployment / API / operation and maintenance
  • Carry out some flow calculation.
  • Similar to kafka Stream

3.3 difference between kafka and kafka

3.3.1 conceptual model

  • Kafka: producer → topic → consumer group → consumer
  • Pulsar: producer → topic → subsciption → consumer

In kafka, there is a consumer group. Consumers in the consumer group can only consume the data in a partition of topic

In pulsar, it is the publish subscribe mode (sub), which can make its own strategies to consume. For example, every consumer can consume all data

3.3.2 message consumption mode

  • Kafka: it mainly focuses on the Stream mode. It is exclusive consumption in a single partition, and there is no Queue consumption mode
  • Pulsar: it provides a unified consumption model and API, and can freely set whether it is one-to-one, exclusive or failover

3.3.3 message acknowledgement (ack)

  • kafka uses offset
  • pulsar has special cursor management to ensure accurate one-time consumption!

3.3.4 message retention

  • kakfa: when creating topic, you can specify the data retention policy, which is 7 days by default. TTL is not supported when the expiration date is deleted directly regardless of consumption
  • pulsar: all subscribers will be deleted only after consumption, and data will not be lost. You can also set the retention period to retain the consumed data and support TTL (how long is it valid)

3.3.5 comparison and summary

  • pulsar is much faster than kafka and takes up less resources

3.4 interpretation of common terms

  • **Messages: * * messages are the basic "unit" of Pulsar. It also refers to the content of the message published from the producer and sent to the consume r after the processing of the message is completed. Messages are similar to letters in the postal service system.

  • **Producers: * * producers are programs that connect topic and publish messages to a Pulsar broker.

  • **Sending mode: * * synchronous or async hronous

  • **Consumers: * * Consumer sends a message flow acquisition request to the broker to obtain the message. There is a queue on the Consumer side to receive messages pushed from the broker. The queue size can be configured through receiverQueueSize (default: 1000). Whenever Consumer Once receive () is called, it gets a message from the buffer.

  • **Receiving mode: * * synchronous receiving (sync) or asynchronous receiving (async)

  • **Listening: * * in this interface, once a new message is received, the received method will be called.

  • Confirmation: when the consumer successfully consumes a message, it will send an acknowledgement request to the broker. Messages are deleted only after all subscriptions have been confirmed. Before that, messages are permanently saved. If you want the message to remain after being confirmed by the consumer, you can configure the message retention policy implementation.

  • **Topic: * * topic in Pulsar is a named channel used to transmit messages from producer to consumer. The name of the topic is a well structured URL: {persistent | non persistent}: / / tenant / namespace / topic

  • **Namespace: * * namespace is a logical naming term within the tenant. A tenant can create multiple namespaces through the admin API. For example, a tenant with multiple applications can create a separate namespace for each application. Namespace enables the program to create and manage topic topic my tenant / app1 in a hierarchical manner. Its namespace is app1, and the corresponding tenant is my tenant. You can create any number of topics in the namespace.

  • **Subscriptions: * * subscriptions are named configuration rules that guide how messages are delivered to consumers. There are four subscription modes available in Pulsar: exclusive, shared, failover and key_shared.

  • **Multi topic subscription: * * Pulsar consumers can subscribe to multiple topics at the same time

4. Pulsar architecture

**Core: * * separation of computing and storage

4.1 composition of single Pulsar cluster

  • Multiple brokers are responsible for processing and load balancing messages sent by the producer (avoiding data skewing to one broker), and dispatching these messages to consumer s
  • broker and pulsar are configured to handle the corresponding tasks and store messages in BookKeeper (Books) instances
  • broker relies on the zookeeper cluster to handle specific tasks
  • The bookkeeper cluster of multiple bookie s is responsible for the persistent storage of messages
  • A zookeeper cluster is used to handle coordination tasks among multiple pulsar clusters


  • Stateless component, mainly responsible for running the other two components:
  • HTTP server, the default port is 8080, and the upper deployment is 8089 port. It exposes the REST system management interface and the API for Topic search between producers and consumers
  • Scheduling distributor, port 6550, one-step TCP server, applied to data transmission through binary protocol

The broker will dispatch the data from the Managed Ledger cache to the consumer. When the backlog exceeds the cache size, it will start to send the data to Bookkeeper


pulsar uses zk for source data storage, cluster configuration and coordination

Configuration storage: stores tenants, namespaces, and other configuration items that need to be globally consistent


Persistent storage container is a distributed pre write (WAL)

Feature reference official website document

4.5,pulsar proxy

Provide a gateway for all brokers. When you can't connect directly, you can communicate with brokers through proxy

5. Introduction to Pulsar operation

5.1 pulsar admin operation namespace command

5.1.1. Create namespaces for designated tenants

pulsar-admin namespaces create test-tenant/test-namespace

5.1.2 list all namespaces under the tenant

pulsar-admin namespaces list test-tenant

5.1.3. Delete existing namespaces under the tenant

pulsar-admin namespaces delete test-tenant/ns1

5.1.4. Set backlog quota policy

pulsar-admin namespaces set-backlog-quota --limit 10--policy producer_request_hold test-tenant/ns1

5.1.5. View the backlog quota policy

pulsar-admin namespaces get-backlog-quotas test-tenant/ns1

5.1.6. Remove backlog quota policy

pulsar-admin namespaces remove-backlog-quota test-tenant/ns1

5.1.7. Set persistence policy

  • Bookkeeper ack quorum: the number of acks (guaranteed copies) each entry is waiting for. The default value is 0
  • Bookkeeper ensembles: the number of bookie s used by a single topic. Default: 0
  • Bookkeeper write quorum: the number of times to write to each entry. The default value is 0
  • Ml mark delete Max rate: limit rate of mark delete operation (0 means unlimited), default value: 0.0
pulsar-admin namespaces set-persistence --bookkeeper-ack-quorum 2--bookkeeper-ensemble 3--bookkeeper-write-quorum 2--ml-mark-delete-max-rate 0 test-tenant/ns1

5.1.8. Obtain persistence strategy

pulsar-admin namespaces get-persistence test-tenant/ns1

5.1.9. Uninstall namespace

pulsar-admin namespaces unload --bundle 0x00000000_0xffffffff test-tenant/ns1

5.1.10 clear message accumulation

pulsar-admin namespaces clear-backlog --submy-subscription test-tenant/ns1

5.1.11 setting message retention parameters

The namespace contains multiple topics. The reserved size (storage size) of each topic should not exceed a specific threshold, otherwise its storage time will be limited. You can configure the retention size and retention time of topic in the specified namespace through the following command.

pulsar-admin set-retention --size 10--time 100 test-tenant/ns1

5.1.12. Set message distribution rate

The dispatch rate of all messages in the given namespace is set to topic. The dispatch rate is limited by MSG dispatch rate or byte dispatch rate. Dispatch rate refers to the number of messages dispatched per second, which can be configured through dispatch rate period. The default values of MSG dispatch rate and byte dispatch rate are - 1, that is, quota restrictions are disabled.

pulsar-admin namespaces set-dispatch-rate test-tenant/ns1 \
--msg-dispatch-rate 1000 \
--byte-dispatch-rate 1048576 \
--dispatch-rate-period 1

5.1.13. Get message distribution rate configuration

Messages sent / sec

pulsar-admin namespaces get-dispatch-rate test-tenant/ns1

5.2 pulsar admin operation Tenants command

5.2.1. Obtain resource list

pulsar-admin tenants list

5.2.2. Creating tenants

pulsar-admin tenants create Tenant name

5.2.3 delete tenant

pulsar-admin tenants delete Tenant name

5.3 pulsar admin operation Topic command

5.3.1 list all persistent topic s under the specified namespace

pulsar-admin persistent list my-tenant/my-namespace

5.3.2. Authorize the client user to perform some operations on the specified topic

pulsar-admin persistent grant-permission \
  --actions produce,consume --role application1 \
  persistent://test-tenant/ns1/tp1 \

5.3.3 obtaining permission

pulsar-admin persistent permissions \
  persistent://test-tenant/ns1/tp1 \
    "application1": [

5.3.4. Cancel permission

pulsar-admin persistent revoke-permission \
  --role application1 \
  persistent://test-tenant/ns1/tp1 \
  "application1": [

5.3.5. Delete topic

pulsar-admin persistent delete  persistent://test-tenant/ns1/tp1 

5.3.6. Uninstall topic under this namespace

pulsar-admin persistent unload   persistent://test-tenant/ns1/tp1

5.3.7. View 10 pieces of data in topic

pulsar-admin persistent peek-messages \
  --count 10 --subscription my-subscription \

5.3.8. Create topic

**Note: * * no matter whether there is a partition or not, if there is no operation within 60s after the topic is created, the topic will be considered inactive and deleted

Relevant parameters:

Brokerdeleteinactivetopicsenabenabled: The default value is true Indicates whether to start the automatic deletion function
BrokerDeleteInactiveTopicsFrequencySeconds: Default 60 s
  • Create topic without partition
pulsar-admin topics create persistent://my-tenant/my-namespace/mytopic
  • Create topic with partition
pulsar-admin topics create-partitioned-copic persistent://my-tenant/my-namespace/mytopic --partitions 5

5.3.9. Which broker is used to query topic

pulsar-admin topics lookup persistent://my-tenant/my-namespace/mytopic

6. Permission Operation

Permission can be operated at the topic level, which is tested at the namespace level

6.1 configuration

6.1.1. Generate secret key

Create a key directory under the pulsar root directory to store keys

bin/pulsar tokens create-secret-key --output key/my-secret.key --base64

6.1.2. Create identity token

Create a super user with a random name, which is mainly used for subsequent management.

bin/pulsar tokens create --secret-key key/my-secret.key --subject hz-super

After execution, get a token from a super user


6.1.3. Configure broker conf


6.1.4. Configure client conf


6.1.5 restart broker and verify

bin/pulsar-daemon stop broker
bin/pulsar-daemon start broker
bin/pulsar-admin tenants list

Already normal


6.2. java testing

6.2.1. Generate a new user's token

bin/pulsar tokens create --secret-key key/my-secret.key  --subject test

Get the token


6.2.2 empowerment

bin/pulsar-admin namespaces grant-permission hz-test-tenants/hz-ns02 --role test  --actions produce,consume

6.2.3. Build java client code

PulsarClient client = PulsarClient.builder()

Direct execution error reporting

Exception in thread "main" org.apache.pulsar.client.api.PulsarClientException$AuthenticationException: Unable to authenticate
	at org.apache.pulsar.client.api.PulsarClientException.unwrap(PulsarClientException.java:965)
	at org.apache.pulsar.client.impl.ConsumerBuilderImpl.subscribe(ConsumerBuilderImpl.java:97)
	at ConsumerDemo.main(ConsumerDemo.java:27)

6.2.4. Add token configuration


Normal execution

6.3 java weighting operation

6.3.1. Create PulsarAdmin

The token of the super account needs to be used for verification, otherwise an error of insufficient permission will be reported

PulsarAdmin admin = PulsarAdmin.builder()

6.3.2. Query user permissions of namespace



{hz-super=[consume, produce], hz-test-tenants=[consume, produce]}

6.3.3. Authorization of namespace access

Set<AuthAction> action  = new HashSet<AuthAction>();
admin.namespaces().grantPermissionOnNamespace("hz-test-tenants/hz-ns02", "hz-produce", action);


{hz-test-tenants=[consume, produce], hz-super=[consume, produce], hz-produce=[produce]}

6.3.4. Permission to remove namespace

admin.namespaces().revokePermissionsOnNamespace("hz-test-tenants/hz-ns02", "hz-produce");


{hz-test-tenants=[consume, produce], hz-super=[consume, produce]}

6.3.5. Query the operation permission of topic



{hz-super=[consume, produce], hz-test-tenants=[consume, produce]}

6.3.6 grant users permission to operate topic

Set<AuthAction> action1 = new HashSet<AuthAction>();
admin.topics().grantPermission("persistent://hz-test-tenants/hz-ns02/hz-topic2","hz-test-topic", action1);


{hz-super=[consume, produce], hz-test-topic=[consume], hz-test-tenants=[consume, produce]}

6.3.7. Remove user's permission

admin.topics().revokePermissions("persistent://hz-test-tenants/hz-ns02/hz-topic2","hz-test-topic") ;


{hz-super=[consume, produce], hz-test-tenants=[consume, produce]}

7. Message retention and expiration policies


  • The user can keep the messages confirmed by the Consumer
  • Default value: defaultretentiontiontimeinminutes = 0
  • The configuration can be in the broker or through the command line:
$ pulsar-admin namespaces get-retention [your tenant]/[your-namespace]
  "retentionTimeInMinutes": 10,
  "retentionSizeInMB": 0


int retentionTime = 10; // 10 minutes
int retentionSize = 500; // 500 megabytes
RetentionPolicies policies = new RetentionPolicies(retentionTime, retentionSize);
admin.namespaces().setRetention("hz-test-tenants/hz-ns02", policies );

7.2 TTL (Time To Live) strategy

  • For unconfirmed messages, users can set TTL to make unconfirmed messages reach the confirmed state
  • By default, Pulsar will persist all unacknowledged messages
  • If there are many unconfirmed messages, this strategy will cause a large number of messages to be overstocked
  • By setting the TTL, the unacknowledged message will enter the confirmed state. When the set TTL time is exceeded, the message will be discarded with the corresponding Retention policy
pulsar-admin namespaces get-message-ttl [your tenant]/[your namespace] 60



8. Connect IO connector

Take mysql binlog as an example

8.1. Download connector

Find the corresponding connector on the download page to download


8.2. Create folder connectors

For these connectors

8.3. Create connector configuration file

tenant: "public"
namespace: "default"
name: "debezium-mysql-source"
topicName: "debezium-mysql-topic"
archive: "connectors/pulsar-io-debezium-mysql-2.8.1.nar"
parallelism: 1

    database.hostname: "risen-cdh01"
    database.port: "3306"
    database.user: "cdh"
    database.password: "123456"
    database.server.id: "184054"
    database.server.name: "dbserver1"
    database.whitelist: "test"
    database.history: "org.apache.pulsar.io.debezium.PulsarDatabaseHistory"
    database.history.pulsar.topic: "history-topic"
    database.history.pulsar.service.url: "pulsar://risen-cdh01:6650"
    key.converter: "org.apache.kafka.connect.json.JsonConverter"
	pulsar.service.url: "pulsar://risen-cdh01:6650"
    value.converter: "org.apache.kafka.connect.json.JsonConverter"
    offset.storage.topic: "offset-topic"

8.4. Create connector

bin/pulsar-admin source  create --source-config-file debeziumConf/mysql.yaml 

8.5. Whether the verification is successful

bin/pulsar-admin persistent list public/default

The corresponding topic is generated

8.6 simulate consumers to subscribe

bin/pulsar-client consume -s "sub-products" public/default/dbserver1.test.mysql_func2 -n 0

database.history.pulsar.topic: "history-topic"
database.history.pulsar.service.url: "pulsar://risen-cdh01:6650"
key.converter: "org.apache.kafka.connect.json.JsonConverter"
pulsar.service.url: "pulsar://risen-cdh01:6650"
value.converter: "org.apache.kafka.connect.json.JsonConverter"
offset.storage.topic: "offset-topic"

### 8.4. Create connector

bin/pulsar-admin source create --source-config-file debeziumConf/mysql.yaml

### 8.5. Whether the verification is successful

bin/pulsar-admin persistent list public/default


Generated corresponding topic

### 8.6 simulate consumers to subscribe

bin/pulsar-client consume -s "sub-products" public/default/dbserver1.test.mysql_func2 -n 0

[External chain picture transfer failed,The origin station may have anti-theft chain mechanism,It is recommended to save the picture and upload it directly(img-EvZG76x7-1644385852303)(C:\Users\ADMINI~1\AppData\Local\Temp\1640938418974.png)]

Topics: Apache