Some brief introduction of using Kafka: 1 clustering 2 Principle 3 terminology

Posted by swallace on Thu, 28 Nov 2019 12:09:26 +0100

[TOC]

Section I Kafka cluster

Before inheriting

If you are a developer and are not interested in building kafka cluster, you can skip this chapter and look at tomorrow's content directly

If you think it's no harm to know more, please keep reading.

As a reminder, there are many figures in this chapter

Kafka cluster construction

Summary

The construction of kafka cluster is complicated. Although it only downloads files and modifies configuration, there are many data

The basic environment needs three zk servers and three kafka servers

Operation process

Look at the picture

It looks longer, so I don't use this method, use docker to simplify the process

Fast construction of Kafka cluster

Install Docker

Calculation review

uname -a
yum -y install docker
service docker start
# or
curl -fsSL https://get.docker.com -o get-docker.sh
sh get-docker.sh

Mirror acceleration

vi /etc/docker/daemon.json
    {
      "registry-mirrors": ["https://uzoia35z.mirror.aliyuncs.com"]
    }

zookeeper cluster

Docker compose installed

New docker network
docker network create --driver bridge --subnet 172.29.0.0/25 \
  --gateway 172.29.0.1  elk_zoo
docker network ls
yml script

The configuration is too long. First, put a structure. The source file will be put on the blog later

The listed projects should be basically configured, with the following key points:

  • Ports: ports
  • Volumes: Mount volumes
  • Environment: environment variable
  • networks: there are two parts, ip and shared network

Please refer to the configuration file for confirmation

docker-compose up -d
Verification

ZooInspector

cd zookeeper/src/contrib/zooinspector/
# Failed to open, need to verify

Kafka cluster

image
docker pull wurstmeister/kafka
docker pull sheepkiller/kafka-manager
yml script

The configuration is too long. First, put a structure. The source file will be put on the blog later

The listed projects should be basically configured, and the following should be paid attention to:

  • Ports: ports
  • Volumes: Mount volumes
  • Environment: environment variable
  • External links
  • networks: there are two parts, ip and shared network

Please refer to the configuration file for confirmation

docker-compose up -d
Verification

Using Kafka manager's management page, add 9000 ports to the local ip

Finish work

In line with the belief in the God of laziness, docker was used to build the cluster in a short time

Please look forward to tomorrow's command line practice

Today's three diagrams are relatively complex and do not need to be memorized, just clear the process against the configuration file

Section 2 cluster management tools

Let's start with a question. Yesterday I built kafka cluster and installed management tools, as shown in the screenshot

Can you see or guess the problems in this cluster? If you have confidence in yourself, you can add my friends for private chat. If you have the right idea, I can also send a small red envelope to encourage you

Cluster management tools

Summary

Kafa manager is a common kafka cluster management tool. There are many similar tools, and there are also tools developed by the company itself

Operation process

After the cluster is configured, you can log in Kafa manager through browser and add cluster management

After adding, it will be displayed as follows

View Broker information

Click Topic to view the Topic

Click again to set a single message

Other

Preferred Replica Election
Reassign Partitions
Consumers

Respectively involves the duplicate election, the partition and the consumer

Because the cluster has just been built, a lot of information will not be seen, and the following several articles will be displayed together with command line operations

Cluster Issues

The following records some common faults and troubleshooting ideas:

  1. Single machine available, cluster failed to send information

    host name cannot be set to 127.0.0.1

  2. Cannot consume information after upgrading

    Check the default topic

    __consumer_offsets

  3. Slow response

    Using performance test scripts:

    kafka-producer-perf-test.sh

    Analyze and generate report

    Check jstack information or locate source code

  4. Log keeps reporting exceptions

    Check kafka log, check GC log, check zk log and GC log, check node memory monitoring

    Finally, the node that reported the exception is offline and then replied for solution

  5. docker encounters unlimited restart of mounted data volume

    View logs, no permission found, configuration

    privileged: true

  6. When running kafka command in docker, you will be prompted that the address is occupied

    unset JMX_PORT;bin/kafka-topics.sh ..

    A more convenient way is to cancel the Kafka env.sh script and define the JMX ﹣ port variable

Section 3 use command to manipulate cluster

Normally, Kafka is connected by code

However, occasionally you want to confirm whether Kafka is wrong or your code is wrong

Or with the no conditions or time to figure out a piece of the code, it's OK to simply use command line

docker
docker inspect zookeeper
zookeeper
Cluster view

Log in to the cluster and judge the status

docker exec -it zoo1 bash
zkServer.sh status  

ZooKeeper JMX enabled by default
Using config: /conf/zoo.cfg
Mode: leader
# Note Mode: standalone

Configuration file

If the status is stand-alone, check the following files:

vi zoo.cfg    # server.1=zoo1:2888:3888, multiple sets
vi myid  # 1 or 2  

# It can also be in the form of environment variables
      ZOO_MY_ID=3 \
      ZOO_SERVERS="server.1=zoo1:2888:3888 server.2=zoo2:2888:3888 server.3=zoo3:2888:3888"

Start zk cluster

./zkServer.sh start
jps   # QuorumPeerMain
kafka view
docker exec -it zoo1 bash
zkCli.sh
ls /
ls /brokers/ids    

# View the node id of kafka
[1, 2, 3]
topic
Create topic

Note that the following commands are all executed in kafka's directory

cd /opt/kafka_2.12-2.3.0/  

unset JMX_PORT;bin/kafka-topics.sh --create --zookeeper zoo1:2181 --replication-factor 1 --partitions 1 --topic test1  

# Add parameters -- bootstrap server localhost: 9091 can use the self-contained zk
# --config delete.retention.ms=21600000 logs for 6 hours
Create cluster topic

Copy factor 1, partition number 3, name test

unset JMX_PORT;bin/kafka-topics.sh --create --zookeeper zoo1:2181,zoo2:2181,zoo3:2181 --replication-factor 1 --partitions 3 --topic test
View topic

List and details

unset JMX_PORT; bin/kafka-topics.sh --list --zookeeper zoo1:2181,zoo2:2181,zoo3:2181

unset JMX_PORT;bin/kafka-topics.sh --describe --zookeeper zoo1:2181,zoo2:2181,zoo3:2181 --topic __consumer_offsets
Delete topic

Default tag deletion

unset JMX_PORT;bin/kafka-topics.sh --delete  --zookeeper zoo1:2181,zoo2:2181,zoo3:2181 --topic test  

#Set delete.topic.enable=true to delete
Producer
send message
cat config/server.properties |grep listeners # Get listening address  

unset JMX_PORT;bin/kafka-console-producer.sh  --broker-list broker1:9091  --topic test2 
# Information can be entered after running
Throughput test
unset JMX_PORT;bin/kafka-producer-perf-test.sh --num-records 100000 --topic test --producer-props bootstrap.servers=b
roker1:9091,broker2:9092,broker3:9093 --throughput 5000 --record-size 102400 --print-metrics 

# 3501 records sent, 699.2 records/sec (68.28 MB/sec), 413.5 ms avg latency, 1019.0 ms max latency.
Consumer
Acceptance message
unset JMX_PORT;bin/kafka-console-consumer.sh  --bootstrap-server  broker1:9091  --topic test2  

# Real time acceptance, from beginning
List consumers
unset JMX_PORT;bin/kafka-consumer-groups.sh --bootstrap-server broker1:9091 --list
# KafkaManagerOffsetCache
# console-consumer-26390
View partition messages

View the latest messages received by the current partition

unset JMX_PORT;bin/kafka-console-consumer.sh --bootstrap-server broker1:9091 --topic test2 --offset latest --partition 0
Throughput test
bin/kafka-consumer-perf-test.sh --topic test --messages 100000 --num-fetch-threads 10 --threads 10 --broker-list broker1:9091,broker2:9092,broker3:9093 --group console-consumer-26390
fault-tolerant
unset JMX_PORT;bin/kafka-topics.sh --describe --zookeeper zoo1:2181,zoo2:2181,zoo3:2181 --topic test2  

docker stop broker3
# Kill a broker and check with the above command. Note that Leader: -1
  
unset JMX_PORT;bin/kafka-topics.sh --describe --zookeeper zoo1:2181,zoo2:2181,zoo3:2181 --topic test2

All the orders were typed by hand to make sure they were available

The commands involved are relatively long. Please copy the commands in the code box at one time, and do not consider line wrapping

Section IV Kafka term description

When it comes to command line operation of kafka cluster yesterday, there is actually a small fault

The cluster was suspended while running the producer throughput test

Alibaba cloud has limited space. The command Kafka producer perf test.sh fills all the disk space in a short time

Today, we will learn some basic knowledge of kafka. Novice, Daniel, please skip

brief introduction

  • Kafka is written in Scala,
  • Official homepage kafka.apache.org,
  • It is defined as a distributed real-time streaming platform,
  • Its performance depends heavily on the performance of the disk,
  • Messages are stateless and need to be deleted regularly or quantitatively

purpose

Message system

This is a famous message oriented middleware

Application monitoring

It is mainly used with ELK in monitoring

User behavior tracking

Record and carry a large amount of user information, and then transfer it to various big data software for processing, such as Hadoop,Spark,Strom

Stream processing

Collect stream data

This is my vacancy. There was a bit of error in the configuration file during the command line operation yesterday, which will be filled later

Persistence log

It mainly uses the performance characteristics of Kafka and Flume + HDFS

performance

It is said that Kafka's performance of ten million level is not so large in our company and we dare not comment on it. However, million level is recognized

The reason for good performance is that a large number of operating system page cache is used, which does not directly participate in physical I/O operations. At the same time, add write mode is used to avoid the performance nightmare of hard disk caused by random write

It also uses sendfile as the representative of zero copy technology to copy data in the kernel and avoid user cache

Data preservation

Here are several directories of Kafka's saved information in Zookeeper. You can understand them properly. View methods:

docker exec -it zoo1 bash
zkCli.sh
ls /
ls /brokers/ids    
...
Directory name purpose
brokers Store cluster and topic information
controller Save node election related information
admin Store the output of the script command
isr_change_notification ISR recording changes
config Record cluster id and version number
controller_epoch Record the version number of controller to avoid tombstone problems

Special noun

Name purpose
broker Refers to Kafka's server
colony Refers to the work unit composed of multiple broker s
news Basic data unit
batch A group of messages
copy Redundant form of message
Message mode How messages are serialized
Submission Update the current location of the partition
theme The corresponding command of the table in mysql is topic
partition The corresponding command is partition
Producer Responsible for message input
Consumer Responsible for message output

Supplement:

  1. Message location:
    tpoic, partition, offset can locate a unique message
  2. Replicas are divided into leader replica and follower replica

The role of follower is to copy data

When the leader hangs up, select a new leader from the following

The function of the follower is to copy the data and select a new leader when the leader hangs up

  1. topic can set multiple partitions with multiple segment s to hold messages

configuration file

There are mainly four related configuration files:

purpose file name
broker configuration server.properties
zookeeper configuration zookeeper.properties
Consumer configuration consumer.properties
Producer configuration producer.properties

The foundation is the foundation. yann also learned these basic things, and only when he saw them again can he despise the above content. So, come on

Section 5 working principle of Kafka cluster

Before inheriting

Yesterday, I sent my own public number to the bigboys. The result was criticized. The format was too chaotic to see. Then I began to adjust my format tour, and even sent dozens of previews, which made me feel dizzy.

So, today's content will be a little watery, I'm sorry

Cluster principle

Here is a brief introduction of kafka's cluster principle. As mentioned before, kafka's cluster is composed of three zookeepers and three kafka clusters

The relationship between them is similar to the following figure:

The relationship between them is not important. As long as we know that ZooKeeper is equivalent to database and Kakka is equivalent to instance, the individuals of both sides are strong enough (with three nodes), and the combination is stronger

Why does Kafka report zk's thigh? In fact, it is to use zk to solve the problem of distribution consistency. Three nodes are distributed on three servers to keep data consistent. Although many systems are maintained by themselves, Kafak is called foreign aid

However, ZooKeeper alone is not enough, and we have to make considerable efforts

Kafka's cluster ensures consistency mainly through data replication and leader election

Data replication refers to that although there are three replicas, only the leader provides external services. The follower always observes the trend of the leader replica. Once there is a new change, it will decisively give it to itself

Leader election means that if the model leader dies, he or she will choose the nearest leader from the following to be promoted

How do you know that the leader hangs up? After each Kafka instance starts, it will register itself in the ZooKeeper service in the form of a session. Once there is a problem, its session with ZooKeeper cannot be maintained, so it will timeout and fail

Just like clock in at work, if you don't clock in for a while, you will know that the leader is cold

Add a noun

ISR: the leader node keeps track of the list of replicas it keeps in sync with, which is called ISR (in sync replica)

Workflow

After knowing the principle of clustering, let's take a look at the workflow

First, the application connects to the ZooKeeper cluster to get some messages of the Kafka cluster. The most important thing is to know who is the leader. The following is simple:

  1. Application sends message to leader
  2. leader writes message to local file
  3. The follower knows later to synchronize the message
  4. Follow the good news and tell the leader
  5. The leader collects all the secondary ACK signals and tells the application

The general process is the above steps, but there will be some details, and parameters can be used to fine tune

For example, a leader does not write to the hard disk as soon as it receives a message. There will be a threshold of time or number of entries. Partiton physically corresponds to a folder, and multiple copies of a partition will not be allocated to the same physical machine. Instead, it will feed back to the application or ensure synchronization first. Which partition the message is written to depends on the parameters

Kafka has an important feature to ensure the order of messages in a single partition. The reason is that Kafka will open a separate disk space and write data in order. There will be multiple groups of segment files in the partition. If the conditions are met, they will be written to the disk, and new segment segments will be opened after writing

Consumption mechanism

Finally, consumers are also applications. In fact, the application is active to pull messages from Kafka. Of course, it is also to find leader s to pull messages. In view of Kafka's strong performance, multiple consumers can be added at the same time, and consumers can form a consumer group. Consumers of the same consumer group can consume data from different partitions under the same topic

When the number of zones is sufficient, there may be one consumer consuming more than one zone. But if the number of zones is more than the number of zones, there may be consumers who do nothing and lie on the side and stand by. Therefore, do not let the number of consumers exceed the number of theme zones

ownership

Handling of messages when the client crashes

  • Consumer group shared receive
  • rebalance of ownership transfer
  • Consumer sends heartbeat to broker to maintain ownership
  • Pull data from the client and record consumption

Log compression

  • partition for a topic
  • Compression does not reorder messages
  • The offset of the message does not change
  • offset of message is order

summary

I'm really sorry. I feel a bit like a tiger The first two sections are very detailed, but the back is cursory. After all, Kafka is a middleware, not a platform If we go further, we need to write the production architecture or describe the business process without paying attention to the original intention. After all, I was going to write a simple Kafka science popularization.

Let's hang up first, and then add in some other ideas I met a little bit more when I took ELK.

Thank you for reading.

Kafka configuration file attached:

# To create a network: docker network create -- driver bridge -- subnet 172.69.0.0/25 -- gateway 172.69.0.1 Kafka [zoo]
version: '2'
services:
  broker1:
    image: wurstmeister/kafka
    restart: always
    hostname: broker1
    container_name: broker1
    ports:
      - "9091:9091"
    external_links:
      - zoo1
      - zoo2
      - zoo3
    environment:
      KAFKA_BROKER_ID: 1
      KAFKA_ADVERTISED_HOST_NAME: broker1
      KAFKA_ADVERTISED_PORT: 9091
      KAFKA_HOST_NAME: broker1
      KAFKA_ZOOKEEPER_CONNECT: zoo1:2181,zoo2:2181,zoo3:2181
      KAFKA_LISTENERS: PLAINTEXT://broker1:9091
      KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://broker1:9091
      JMX_PORT: 9988
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock
      - "/root/kafka/broker1/:/kafka"
    networks:
      default:
        ipv4_address: 172.69.0.11
  broker2:
    image: wurstmeister/kafka
    restart: always
    hostname: broker2
    container_name: broker2
    ports:
      - "9092:9092"
    external_links:
      - zoo1
      - zoo2
      - zoo3
    environment:
      KAFKA_BROKER_ID: 2
      KAFKA_ADVERTISED_HOST_NAME: broker2
      KAFKA_ADVERTISED_PORT: 9092
      KAFKA_HOST_NAME: broker2
      KAFKA_ZOOKEEPER_CONNECT: zoo1:2181,zoo2:2181,zoo3:2181
      KAFKA_LISTENERS: PLAINTEXT://broker2:9092
      KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://broker2:9092
      JMX_PORT: 9988
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock
      - "/root/kafka/broker2/:/kafka"
    networks:
      default:
        ipv4_address: 172.69.0.12
  broker3:
    image: wurstmeister/kafka
    restart: always
    hostname: broker3
    container_name: broker3
    ports:
      - "9093:9093"
    external_links:
      - zoo1
      - zoo2
      - zoo3
    environment:
      KAFKA_BROKER_ID: 3
      KAFKA_ADVERTISED_HOST_NAME: broker3
      KAFKA_ADVERTISED_PORT: 9093
      KAFKA_HOST_NAME: broker3
      KAFKA_ZOOKEEPER_CONNECT: zoo1:2181,zoo2:2181,zoo3:2181
      KAFKA_LISTENERS: PLAINTEXT://broker3:9093
      KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://broker3:9093
      JMX_PORT: 9988
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock
      - "/root/kafka/broker3/:/kafka"
    networks:
      default:
        ipv4_address: 172.69.0.13
  kafka-manager:
    image: sheepkiller/kafka-manager
    restart: always
    container_name: kafa-manager
    hostname: kafka-manager
    ports:
      - "9002:9000"
    links:            # Connect the container created by this compose file
      - broker1
      - broker2
      - broker3
    external_links:   # Connect container s other than this compose file
      - zoo1
      - zoo2
      - zoo3
    environment:
      ZK_HOSTS: zoo1:2181,zoo2:2181,zoo3:2181
      KAFKA_BROKERS: broker1:9091,broker2:9092,broker3:9093
      APPLICATION_SECRET: letmein
      KM_ARGS: -Djava.net.preferIPv4Stack=true
    networks:
      default:
        ipv4_address: 172.69.0.10
networks:
  default:
    external:
      name: kafka_zoo


# mkdir -p /root/kafka/broker1
# mkdir -p /root/kafka/broker2
# mkdir -p /root/kafka/broker3
 

This article is based on the platform of blog one article multiple sending OpenWrite Release!
There is a format difference between the article published on the platform and the original. Please forgive me for the inconvenience
The latest content is welcome to pay attention to the public address:

Topics: Linux kafka Docker Zookeeper network