Install zookeeper, kafka, druid with docker to ingest data

Posted by jmcc on Sat, 22 Jan 2022 06:30:48 +0100

All data in Druid is organized into segments, which are data files, usually up to millions of rows per segment. Loading data in Druid is called ingestion and includes reading data from the source system and creating segments based on that data. In most ingestion methods, loading data is done by the MiddleManager process.

Druid data is stored in a data source, similar to tables in traditional relational databases.

Druid divides the dataset into three parts:

Timestamp column: The time field is processed separately because all queries for Druid are around the time axis.
Dimension columns: Dimension columns are attributes of data and are stored as-is columns that can be grouped, filtered, or aggregated in a special way when querying.
Metric columns: Indicator columns are columns that are stored as aggregates for aggregation calculations of data. Indicator columns are generally measurable data and can be count ed, sum, and so on.

The goal of the project is to use Druids to store vehicle positioning information, as shown in the following figure:

apache druid installation

Apache druid runs on a number of other software environments, so you need to install the dependent environments before installing druid.

These dependent environments include:

jdk
zookeeper
kafka

The following installed ip addresses need to be changed to their own ip addresses, otherwise start error

zookeeper installation

#Startup container:
docker run -id --name my_zookeeper -p 2181:2181 -v /etc/localtime:/etc/localtime zookeeper:3.4.14
#View how the container works:
docker logs -f my_zookeeper

kafka installation

Download Installation
#Pull mirror
docker pull wurstmeister/kafka
#Start Container
docker run -id --name kafka -p 9092:9092 -e KAFKA_BROKER_ID=0 -e KAFKA_ZOOKEEPER_CONNECT=68.79.63.42:2181 -e KAFKA_ADVERTISED_LISTENERS=PLAINTEXT://68.79.63.42:9092 -e KAFKA_LISTENERS=PLAINTEXT://0.0.0.0:9092 -v /etc/localtime:/etc/localtime wurstmeister/kafka

#View how the container works:
docker logs -f kafka

Verification

After the installation is complete, you need to test if the installation was successful:

#Enter kafka container
docker exec -it kafka /bin/bash
#Enter the bin directory
cd /opt/kafka_2.12-2.5.0/bin/
#Execute the command to create a queue named test with a copy and partition
./kafka-topics.sh --create --zookeeper 68.79.63.42:2181 --replication-factor 1 --partitions 1 --topic test
#View the queue just created
./kafka-topics.sh -list -zookeeper 68.79.63.42:2181



Below is the monitoring and reception of messages:

Install druid

cd /root
wget https://archive.apache.org/dist/incubator/druid/0.15.0-incubating/apache-druid-0.15.0-incubating-bin.tar.gz

Installation:

tar -zxvf apache-druid-0.15.0-incubating-bin.tar.gz
mv apache-druid-0.15.0-incubating druid
​
#druid is now installed in the / root directory, which is / root/druid

To configure

# Because local zk is used, the detection of port 2181 needs to be removed
vi /root/druid/bin/verify-default-ports
my @ports = (1527, 8081, 8082, 8083, 8090, 8091, 8200, 9095);
​
# Comment on Zookeeper configuration
vi /root/druid/conf/supervise/single-server/small.conf
#!p10 zk bin/run-zk conf
​
# Modify Public Configuration
vi /root/druid/conf/druid/single-server/small/_common/common.runtime.properties
#
# Hostname
#
druid.host=127.0.0.1
​
#
# Zookeeper
#
druid.zk.service.host=127.0.0.1
druid.zk.paths.base=/druid
​
# Setting component time zones
vi /root/druid/conf/druid/single-server/small/broker/jvm.config
-Duser.timezone=UTC+8
​
vi /root/druid/conf/druid/single-server/small/coordinator-overlord/jvm.config
-Duser.timezone=UTC+8
​
vi /root/druid/conf/druid/single-server/small/historical/jvm.config
-Duser.timezone=UTC+8
​
vi /root/druid/conf/druid/single-server/small/middleManager/jvm.config
-Duser.timezone=UTC+8
vi /root/druid/conf/druid/single-server/small/router/jvm.config
-Duser.timezone=UTC+8
​

start-up

Execute startup script

/root/druid/bin/start-single-server-small

Access after startup: http://{druid service ip}:8888

apache druid uptake data

Preconditions
Start zk, port 2181
Start kafka, port 9092
Start druid

Intake data from local files

Case 1
Step 1: Access the Druid console http://{druid service ip}:8888/

Step 2: Click Load data to enter the Load Data page and enter the following parameters
Enter quickstart/tutorial/ in Base directory and wikiticker-2015-09-12-sampled in File filter. Json. Gz.

Step 3: Click Next:Parse data to go to the next page

Step 4: Click Next:Parse time to go to the next page

Step 5: Click Next:Transform to go to the next page

Step 6: Click Next:Filter to go to the next page

Step 7: Click Next:Configure schema to go to the next page

Step 8: Click Next:Partition to go to the next page

Step 9: Click Next:Tune to go to the next page

Step 10: Click Next:Publish to go to the next page

Step 11: Click Next:Edit JSON spec to go to the next page

You can see that the json data displayed on this page is actually a data uptake specification, which ultimately generates the current json data based on the parameters set on each of the previous pages.

Data uptake task specifications can be written either manually or through the built-in data loader in the Druid console. The data loader can help you generate the intake task specification by sampling the intake data and configuring various intake parameters. The data loader currently only supports local batch extraction (streaming support will be available in future releases, including data stored in Apache Kafka and AWS Kinesis). Currently, flow intake can only be achieved by manually writing intake specifications.

Step 12: Click the Submit button to submit a task and jump to the task page

Step 13: When the task is finished, click Datasources to enter the data source page and you will see the mydatasource data source

Step 14: Click Query to enter the data query page to query the data

Intake data from Kafka

Dead work:

#Enter kafka container

docker exec -it kafka /bin/bash

#Enter the bin directory

cd /opt/kafka_2.12-2.5.0/bin/

#Execute the command to create a queue named test with a copy and partition

./kafka-topics.sh --create --zookeeper 68.79.63.42:2181 --replication-factor 1 --partitions 1 --topic ip_msg

#View the queue just created

./kafka-topics.sh -list -zookeeper 68.79.63.42:2181

Step 1: Access the Druid console http://{druid service ip}:8888/

Step 2: Click the Tasks button to enter the task page

Step 3: Click the Submit supervisor button to pop up the window

Step 4: Enter the following in the pop-up window to configure the data intake policy (to ingest data from kafka's ip_msg queue)

{
  "type": "kafka",
  "dataSchema": {
    "dataSource": "ip_msg",
    "parser": {
      "type": "string",
      "parseSpec": {
        "format": "json",
        "timestampSpec": {
          "column": "timestamp",
          "format": "iso"
        },
        "dimensionsSpec": {
          "dimensions": [
            "srcIP",
            "dstIP"
          ]
        }
      }
    },
    "metricsSpec" : [
        { "type" : "count", "name" : "count" },
        { "type" : "longSum", "name" : "packets", "fieldName" : "packets" },
        { "type" : "longSum", "name" : "bytes", "fieldName" : "bytes" }
    ],
    "granularitySpec": {
      "type": "uniform",
      "segmentGranularity": "week",
      "queryGranularity": "minute",
      "rollup": true
    }
  },
  "tuningConfig": {
    "type": "kafka",
    "reportParseExceptions": false
  },
  "ioConfig": {
    "topic": "ip_msg",
    "replicas": 1,
    "taskDuration": "PT10M",
    "completionTimeout": "PT20M",
    "consumerProperties": {
      "bootstrap.servers": "39.99.56.166:9092"
    }
  }
}

Step 5: Execute the command to send a message to the kafka queue

#Start producer, to ip_msg queue send message

./kafka-console-producer.sh --broker-list 68.79.63.42:9092 --topic ip_msg

The message is as follows:

{"timestamp":"2018-01-01T01:01:35Z","srcIP":"1.1.1.1", "dstIP":"2.2.2.2","packets":20,"bytes":9024}
{"timestamp":"2018-01-01T01:01:51Z","srcIP":"1.1.1.1", "dstIP":"2.2.2.2","packets":255,"bytes":21133}
{"timestamp":"2018-01-01T01:01:59Z","srcIP":"1.1.1.1", "dstIP":"2.2.2.2","packets":11,"bytes":5780}
{"timestamp":"2018-01-01T01:02:14Z","srcIP":"1.1.1.1", "dstIP":"2.2.2.2","packets":38,"bytes":6289}
{"timestamp":"2018-01-01T01:02:29Z","srcIP":"1.1.1.1", "dstIP":"2.2.2.2","packets":377,"bytes":359971}
{"timestamp":"2018-01-01T01:03:29Z","srcIP":"1.1.1.1", "dstIP":"2.2.2.2","packets":49,"bytes":10204}
{"timestamp":"2018-01-02T21:33:14Z","srcIP":"7.7.7.7", "dstIP":"8.8.8.8","packets":38,"bytes":6289}
{"timestamp":"2018-01-02T21:33:45Z","srcIP":"7.7.7.7", "dstIP":"8.8.8.8","packets":123,"bytes":93999}
{"timestamp":"2018-01-02T21:35:45Z","srcIP":"7.7.7.7", "dstIP":"8.8.8.8","packets":12,"bytes":2818}

Step 6: Query the data on the Query page of the druid console

Java Client Operation druid

Import maven coordinates

<dependency>
    <groupId>org.apache.calcite.avatica</groupId>
    <artifactId>avatica-core</artifactId>
    <version>1.15.0</version>
</dependency>
<dependency>
    <groupId>org.apache.calcite.avatica</groupId>
    <artifactId>avatica</artifactId>
    <version>1.15.0</version>
</dependency>

Code

public static void main(String[] args) throws Exception{
    Class.forName("org.apache.calcite.avatica.remote.Driver");
    Connection connection = DriverManager.getConnection("jdbc:avatica:remote:url=http://68.79.63.42:8888/druid/v2/sql/avatica/");
    Statement st = null;
    ResultSet rs = null;
    try {
        st = connection.createStatement();
        rs = st.executeQuery("select currentTime,name,phone,licensePlate,businessId,lat,lng from tms_order_location");
        ResultSetMetaData rsmd = rs.getMetaData();
        List<Map> resultList = new ArrayList();
        while (rs.next()) {
            Map map = new HashMap();
            for (int i = 0; i < rsmd.getColumnCount(); i++) {
                String columnName = rsmd.getColumnName(i + 1);
                map.put(columnName, rs.getObject(columnName));
            }
            resultList.add(map);
        }
        System.out.println(resultList);
    } catch (Exception e) {
        e.printStackTrace();
    } finally {
        try {
            connection.close();
        } catch (SQLException e) {
        }
    }
}

netty development

The purpose of the netty service is to receive the vehicle location information reported by the driver and send it to the kafka queue. Netty provides two ways to receive positioning information reported by drivers: TCP based on netty and HTTP interface.

1 HTTP Interface Mode

Step 1: Write bootstrap.yml

server:
  tomcat:
    uri-encoding: UTF-8
    max-threads: 1000
    min-spare-threads: 30
  port: 8192
  connection-timeout: 50000ms
​
spring:
  application:
    name: pd-netty
  profiles:
      active: dev
  main:
    allow-bean-definition-overriding: true

Step 2: Write bootstrap-dev.yml

spring:
  cloud:
    nacos:
      username: tms
      password: itheima123
      discovery:
        server-addr: 68.79.63.42:8848
        group: pinda-tms
        namespace: 301c574e-ae40-42c4-b717-d5379057f6a4
      config:
        server-addr: 68.79.63.42:8848
        file-extension: yml
        group: pinda-tms
        namespace: 301c574e-ae40-42c4-b717-d5379057f6a4
  # jackson time format
  jackson:
    time-zone: ${spring.jackson.time-zone}
    date-format: ${spring.jackson.date-format}
  servlet:
    multipart:
      max-file-size: ${spring.servlet.multipart.max-file-size}
      max-request-size: ${spring.servlet.multipart.max-request-size}
      enabled: ${spring.servlet.multipart.enabled}
  # kafka
  kafka:
    bootstrap-servers: ${spring.kafka.bootstrap-servers}
    listener:   # Specifies the number of threads in the listener container to increase concurrency
      concurrency: ${spring.kafka.listener.concurrency}
    producer:
      retries: ${spring.kafka.producer.retries}
      batch-size: ${spring.kafka.producer.batch-size}
      buffer-memory: ${spring.kafka.producer.buffer-memory}
      key-serializer: ${spring.kafka.producer.key-serializer}
      value-serializer: ${spring.kafka.producer.value-serializer}
    consumer:
      group-id: ${spring.kafka.consumer.group-id}
      

Step 3: Write the LocationEntity entity

package com.itheima.pinda.entity;
​
import io.swagger.annotations.ApiModel;
import io.swagger.annotations.ApiModelProperty;
import lombok.Data;
​
@Data
@ApiModel("position information")
public class LocationEntity {
    public String getId() {
        return businessId + "#" + type + "#" + currentTime;
    }
​
    /**
     * Vehicle Id
     */
    @ApiModelProperty("business id, Courier id Or vehicle id")
    private String businessId;
​
    /**
     * Driver name
     */
    @ApiModelProperty("Driver name")
    private String name;
​
    /**
     * Driver's phone
     */
    @ApiModelProperty("Driver's phone")
    private String phone;
​
    /**
     * License plate number
     */
    @ApiModelProperty("licensePlate")
    private String licensePlate;
​
    /**
     * type
     */
    @ApiModelProperty("Type, Vehicle: truck,Courier: courier")
    private String type;
​
    /**
     * longitude
     */
    @ApiModelProperty("longitude")
    private String lng;
​
    /**
     * dimension
     */
    @ApiModelProperty("dimension")
    private String lat;
​
    /**
     * current time
     */
    @ApiModelProperty("Current time format: yyyyMMddHHmmss")
    private String currentTime;
​
    @ApiModelProperty("Owning Fleet")
    private String team;
​
    @ApiModelProperty("Transport Tasks id")
    private String transportTaskId;
}

Step 4: Write the startup class

package com.itheima.pinda;
​
import org.springframework.boot.SpringApplication;
import org.springframework.boot.autoconfigure.SpringBootApplication;
import org.springframework.cloud.client.discovery.EnableDiscoveryClient;
​
@SpringBootApplication
@EnableDiscoveryClient
public class NettyApplication {
​
    public static void main(String[] args) {
        SpringApplication.run(NettyApplication.class, args);
    }
}

Step 5: Write a KafkaSender to operate on kafka

import lombok.extern.slf4j.Slf4j;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.kafka.core.KafkaTemplate;
import org.springframework.stereotype.Component;
import javax.annotation.PostConstruct;
​
@Component
@Slf4j
public class KafkaSender {
    public final static String MSG_TOPIC = "tms_order_location";
    @Autowired
    private KafkaTemplate<String, String> kafkaTemplate;
    private static KafkaTemplate<String, String> template;
​
    @PostConstruct
    public void init() {
        KafkaSender.template = this.kafkaTemplate;
    }
​
    //Send message to kafka queue
    public static boolean send(String topic, String message) {
        try {
            template.send(topic, message);
            log.info("Message sent successfully:{} , {}", topic, message);
        } catch (Exception e) {
            log.error("Message sending failed:{} , {}", topic, message, e);
            return false;
        }
        return true;
    }
​
}

Step 6: Write NettyController to accept the vehicle location information reported by the driver through HTTP interface

​
@RestController
@Api(tags = "Vehicle Track Service")
@RequestMapping("netty")
@Slf4j
public class NettyController {
    @PostMapping(value = "/push")
    public Result push(@RequestBody LocationEntity locationEntity) {
        String message = JSON.toJSONString(locationEntity);
        log.info("HTTP How to push location information:{}", message);
        KafkaSender.send(KafkaSender.MSG_TOPIC, message);
        return Result.ok();
    }
}

2 TCP Mode

Netty is a Java open source framework provided by JBOSS. Netty provides an asynchronous, event-driven framework and tools for rapidly developing high-performance, highly reliable network servers and clients.
Step 1: Write the NettyServer service startup class

package com.itheima.pinda.config;
​
import com.itheima.pinda.service.NettyServerHandler;
import io.netty.bootstrap.ServerBootstrap;
import io.netty.channel.ChannelFuture;
import io.netty.channel.ChannelOption;
import io.netty.channel.EventLoopGroup;
import io.netty.channel.nio.NioEventLoopGroup;
import io.netty.channel.socket.nio.NioServerSocketChannel;
import lombok.extern.slf4j.Slf4j;
import org.springframework.beans.factory.annotation.Value;
import org.springframework.boot.CommandLineRunner;
import org.springframework.stereotype.Component;
import javax.annotation.PostConstruct;
​
/**
 * netty Service Startup Class
 */
@Component
@Slf4j
public class NettyServer implements CommandLineRunner {
    private static NettyServer nettyServer;
​
    @PostConstruct
    public void init() {
        nettyServer = this;
    }
​
    @Value("${netty.port}")
    private int port;
​
    private EventLoopGroup mainGroup;
    private EventLoopGroup subGroup;
    private ServerBootstrap server;
    private ChannelFuture future;
​
    public NettyServer() {
        // NIO Thread Group for handling network events
        mainGroup = new NioEventLoopGroup();
        subGroup = new NioEventLoopGroup();
        // Service Initialization Tool encapsulates complex code to initialize services
        server = new ServerBootstrap();
        
        server.group(mainGroup, subGroup)
                .option(ChannelOption.SO_BACKLOG, 128)// Set Cache
                .childOption(ChannelOption.SO_KEEPALIVE, true)
                .channel(NioServerSocketChannel.class)// Specify using NioServerSocketChannel to generate a Channel to receive connections
                .childHandler(new NettyServerHandler());//Handle Network IO Events Specifically
​
    }
​
    public void start() {
        // Start server, bind port
        this.future = server.bind(nettyServer.port);
        log.info("Netty Server Start Completed!!!!  Port:" + nettyServer.port);
    }
​
    @Override
    public void run(String... args) {
        this.start();
    }
}

Step 2: Write the NettyServerHandler processing class

import com.alibaba.fastjson.JSON;
import com.itheima.pinda.entity.LocationEntity;
import io.netty.buffer.ByteBuf;
import io.netty.buffer.Unpooled;
import io.netty.channel.ChannelFutureListener;
import io.netty.channel.ChannelHandler;
import io.netty.channel.ChannelHandlerContext;
import io.netty.channel.ChannelInboundHandlerAdapter;
import io.netty.util.ReferenceCountUtil;
import lombok.extern.slf4j.Slf4j;
import org.apache.commons.lang.StringUtils;
import java.io.UnsupportedEncodingException;
​
/**
 * netty Business Processing
 */
@Slf4j
@ChannelHandler.Sharable
public class NettyServerHandler extends ChannelInboundHandlerAdapter {
    @Override
    public void channelRead(ChannelHandlerContext ctx, Object msg) {
        log.info("ServerHandler.channelRead()");
        ByteBuf in = (ByteBuf) msg;
        try {
            //Receive Messages
            String body = getRequestBody(in);
            log.info("Message Content:{}", body);
​
            //Parse message
            String message = parseMessage(body);
            if (StringUtils.isBlank(message)) {
                log.info("Message parsing failed");
                return;
            }
​
            //Send to kafka queue
            KafkaSender.send(KafkaSender.MSG_TOPIC, message);
​
        } catch (Exception e) {
            log.error(e.getMessage());
        } finally {
            //After using ByteBuf, you need to actively release resources, otherwise, resources are always loaded in memory, which can easily cause memory leaks
            ReferenceCountUtil.release(msg);
        }
        if (null != in) {
            //Restore the current write pointer writerIndex to the previously saved markedWriterIndex value
            in.resetWriterIndex();
        }
    }
​
    /**
     * Resolve Request Content
     *
     * @param in
     * @return
     * @throws UnsupportedEncodingException
     */
    private String getRequestBody(ByteBuf in) throws UnsupportedEncodingException {
        if (in.readableBytes() <= 0) {
            return null;
        }
        byte[] req = new byte[in.readableBytes()];
        in.readBytes(req);
        return new String(req, "UTF-8");
    }
​
    /**
     * Parse message
     * <p>
     * Different devices have different messages, this device is mobile, directly using json format for transmission
     */
    private String parseMessage(String body) {
        if (StringUtils.isBlank(body)) {
            log.warn("Message is empty");
            return null;
        }
        body = body.trim();
        // Messages in other formats need to be parsed and placed into the MessageEntity entity
        LocationEntity message = JSON.parseObject(body, LocationEntity.class);
        if (message == null || StringUtils.isBlank(message.getType()) || StringUtils.isBlank(message.getBusinessId()) || StringUtils.isBlank(message.getLat()) || StringUtils.isBlank(message.getLng()) || StringUtils.isBlank(message.getId())) {
            log.warn("Message Content Exception");
            return null;
        }
​
        String result = JSON.toJSONString(message);
        return result;
    }
​
    @Override
    public void channelReadComplete(ChannelHandlerContext ctx) {
        // Write an empty buf and refresh the write out area. Close the sock channel connection when finished.
        ctx.writeAndFlush(Unpooled.EMPTY_BUFFER).addListener(ChannelFutureListener.CLOSE);
    }
​
    @Override
    public void exceptionCaught(ChannelHandlerContext ctx, Throwable cause) {
        // Close connection with exception
        ctx.close();
    }
}

Topics: Docker kafka Zookeeper