Blog dry goods | using Apache Pulsar in Kotlin

Posted by blackandwhite on Tue, 01 Mar 2022 08:48:16 +0100

About Apache Pulsar

Apache Pulsar is a top-level project of Apache Software Foundation. It is a native distributed message flow platform of the next generation cloud. It integrates message, storage and lightweight functional computing. It adopts the design of separation of computing and storage architecture, supports multi tenant, persistent storage and multi machine room cross regional data replication, and has strong consistency, high throughput Stream data storage features such as low latency and high scalability.
GitHub address: http://github.com/apache/pulsar/

This article is translated from: Using Apache Pulsar With Kotlin, by Gilles Barbier.
Original link: https://gillesbarbier.medium.com/using-apache-pulsar-with-kotlin-3b0ab398cf52

Introduction to the translator

Song Bo, working in Beijing Baiguan Technology Co., Ltd., is a senior development engineer, focusing on the fields of micro services, cloud computing and big data.

Apache Pulsar[1], often described as the next generation Kafka, is a rising star in the developer tool set. Pulsar is a multi tenant, high-performance solution for server to server messaging. It is usually used as the core of scalable applications.

Pulsar can be used with Kotlin[2] because it is written in Java. However, its API does not take into account the powerful functions brought by Kotlin, such as data class [3], coroutine [4], or non reflective serialization [5].

In this article, I will discuss how to use Pulsar through Kotlin.

Use native serialization for message body

A default way to define messages in Kotlin is to use data classes [6], whose main purpose is to save data. For such data classes, Kotlin will automatically provide methods such as equals(), toString(), copy(), etc., so as to shorten the code length and reduce the risk of errors.

Create a Pulsar producer using Java [7]:

Producer<MyAvro> avroProducer = client
  .newProducer(Schema.AVRO(MyAvro.class))
  .topic("some-avro-topic")
  .create();

The Schema The Avro (myavro. Class) instruction will introspect the MyAvro Java class and infer a Schema from it. This needs to verify whether the new producer will produce messages that are actually compatible with existing consumers. However, the Java implementation of the Kotlin data class does not work well with the default serializer used by Pulsar. Fortunately, starting with version 2.7.0, Pulsar allows you to use custom serializers for producers and consumers.

First, you need to install the Kotlin serialization plug-in [8]. Use it to create a message class as follows:

@Serializable
        data class RunTask(
             val taskName: TaskName,
             val taskId: TaskId,
        val taskInput: TaskInput,
        val taskOptions: TaskOptions,
        val taskMeta: TaskMeta
         )

Note the @ Serializable annotation.
With it, you can use runtask Serialiser () allows the serializer to work without introspection, which will greatly improve efficiency!

Currently, the serialization plug-in only supports JSON (and some other formats in beta, such as protobuf). So we also need avro4k[9] library to extend it and support Avro format.

Using these tools, we can create a Producer like the following Task:

import com.github.avrokotlin.avro4k.Avro
import com.github.avrokotlin.avro4k.io.AvroEncodeFormat
import io.infinitic.common.tasks.executors.messages.RunTask
import kotlinx.serialization.KSerializer
import org.apache.avro.file.SeekableByteArrayInput
import org.apache.avro.generic.GenericDatumReader
import org.apache.avro.generic.GenericRecord
import org.apache.avro.io.DecoderFactory
import org.apache.pulsar.client.api.Consumer
import org.apache.pulsar.client.api.Producer
import org.apache.pulsar.client.api.PulsarClient
import org.apache.pulsar.client.api.Schema
import org.apache.pulsar.client.api.schema.SchemaDefinition
import org.apache.pulsar.client.api.schema.SchemaReader
import org.apache.pulsar.client.api.schema.SchemaWriter
import java.io.ByteArrayOutputStream
import java.io.InputStream

// Convert T instance to Avro schemaless binary format
fun <T : Any> writeBinary(t: T, serializer: KSerializer<T>): ByteArray {
    val out = ByteArrayOutputStream()
    Avro.default.openOutputStream(serializer) {
        encodeFormat = AvroEncodeFormat.Binary
        schema = Avro.default.schema(serializer)
    }.to(out).write(t).close()

    return out.toByteArray()
}

// Convert Avro schemaless byte array to T instance
fun <T> readBinary(bytes: ByteArray, serializer: KSerializer<T>): T {
    val datumReader = GenericDatumReader<GenericRecord>(Avro.default.schema(serializer))
    val decoder = DecoderFactory.get().binaryDecoder(SeekableByteArrayInput(bytes), null)

    return Avro.default.fromRecord(serializer, datumReader.read(null, decoder))
}

// custom Pulsar SchemaReader
class RunTaskSchemaReader: SchemaReader<RunTask> {
    override fun read(bytes: ByteArray, offset: Int, length: Int) =
        read(bytes.inputStream(offset, length))

    override fun read(inputStream: InputStream) =
        readBinary(inputStream.readBytes(), RunTask.serializer())
}

// custom Pulsar SchemaWriter
class RunTaskSchemaWriter : SchemaWriter<RunTask> {
    override fun write(message: RunTask) = writeBinary(message, RunTask.serializer())
}

// custom Pulsar SchemaDefinition<RunTask>
fun runTaskSchemaDefinition(): SchemaDefinition<RunTask> =
    SchemaDefinition.builder<RunTask>()
        .withJsonDef(Avro.default.schema(RunTask.serializer()).toString())
        .withSchemaReader(RunTaskSchemaReader())
        .withSchemaWriter(RunTaskSchemaWriter())
        .withSupportSchemaVersioning(true)
        .build()

// Create an instance of Producer<RunTask>
fun runTaskProducer(client: PulsarClient): Producer<RunTask> = client
    .newProducer(Schema.AVRO(runTaskSchemaDefinition()))
    .topic("some-avro-topic")
    .create();

// Create an instance of Consumer<RunTask>
fun runTaskConsumer(client: PulsarClient): Consumer<RunTask> = client
    .newConsumer(Schema.AVRO(runTaskSchemaDefinition()))
    .topic("some-avro-topic")
    .subscribe();

Seal class messages and one encapsulation for each Topic

Pulsar allows only one type of message per Topic. In some special cases, this does not meet all the needs. But this problem can be changed by using encapsulation mode.

First, create all types of messages from a Topic using a sealed class:

@Serializable
sealed class TaskEngineMessage() {
    abstract val taskId: TaskId
}

@Serializable
data class DispatchTask(
    override val taskId: TaskId,
    val taskName: TaskName,
    val methodName: MethodName,
    val methodParameterTypes: MethodParameterTypes?,
    val methodInput: MethodInput,
    val workflowId: WorkflowId?,
    val methodRunId: MethodRunId?,
    val taskMeta: TaskMeta,
    val taskOptions: TaskOptions = TaskOptions()
) : TaskEngineMessage()

@Serializable
data class CancelTask(
    override val taskId: TaskId,
    val taskOutput: MethodOutput
) : TaskEngineMessage()

@Serializable
data class TaskCanceled(
    override val taskId: TaskId,
    val taskOutput: MethodOutput,
    val taskMeta: TaskMeta
) : TaskEngineMessage()

@Serializable
data class TaskCompleted(
    override val taskId: TaskId,
    val taskName: TaskName,
    val taskOutput: MethodOutput,
    val taskMeta: TaskMeta
) : TaskEngineMessage()

Then, create an encapsulation for these messages:

Note @Serializable
data class TaskEngineEnvelope(
    val taskId: TaskId,
    val type: TaskEngineMessageType,
    val dispatchTask: DispatchTask? = null,
    val cancelTask: CancelTask? = null,
    val taskCanceled: TaskCanceled? = null,
    val taskCompleted: TaskCompleted? = null,
) {
    init {
        val noNull = listOfNotNull(
            dispatchTask,
            cancelTask,
            taskCanceled,
            taskCompleted
        )

        require(noNull.size == 1)
        require(noNull.first() == message())
        require(noNull.first().taskId == taskId)
    }

    companion object {
        fun from(msg: TaskEngineMessage) = when (msg) {
            is DispatchTask -> TaskEngineEnvelope(
                msg.taskId,
                TaskEngineMessageType.DISPATCH_TASK,
                dispatchTask = msg
            )
            is CancelTask -> TaskEngineEnvelope(
                msg.taskId,
                TaskEngineMessageType.CANCEL_TASK,
                cancelTask = msg
            )
            is TaskCanceled -> TaskEngineEnvelope(
                msg.taskId,
                TaskEngineMessageType.TASK_CANCELED,
                taskCanceled = msg
            )
            is TaskCompleted -> TaskEngineEnvelope(
                msg.taskId,
                TaskEngineMessageType.TASK_COMPLETED,
                taskCompleted = msg
            )
        }
    }

    fun message(): TaskEngineMessage = when (type) {
        TaskEngineMessageType.DISPATCH_TASK -> dispatchTask!!
        TaskEngineMessageType.CANCEL_TASK -> cancelTask!!
        TaskEngineMessageType.TASK_CANCELED -> taskCanceled!!
        TaskEngineMessageType.TASK_COMPLETED -> taskCompleted!!
    }
}

enum class TaskEngineMessageType {
    CANCEL_TASK,
    DISPATCH_TASK,
    TASK_CANCELED,
    TASK_COMPLETED
}

Notice how Kotlin gracefully checks init! You can use taskengineenvirope From (MSG) makes it easy to create a package and use envelope Message() returns the original message.

Why is an explicit taskId value added here instead of using a global field message:TaskEngineMessage, and a field for each message type? This is because in this way, I can use PulsarSQL[10] to obtain the information of this Topic with the help of taskId or type, or a combination of both.

Build Worker through collaborative process

Using Thread in normal Java is complex and error prone. Fortunately, Koltin provides , coroutines[11] - a simpler asynchronous processing abstraction -- and , channels[12] - a convenient way to transfer data between processes.

I can create a Worker in the following ways:

• a single ("task engine message puller") is a process dedicated to pulling messages from Pulsar • N processes ("task engine - $I") process messages in parallel • a single ("task engine message acknodeger") confirms the process of Pulsar messages after processing

After there are many processes like this, I have added a logChannel to collect logs. Please note that in order to confirm the Pulsar message in a process different from the one receiving it, I need to encapsulate the TaskEngineMessage into messagetoprocess < TaskEngineMessage > containing Pulsar messageId:

typealias TaskEngineMessageToProcess = MessageToProcess<TaskEngineMessage>

fun CoroutineScope.startPulsarTaskEngineWorker(
    taskEngineConsumer: Consumer<TaskEngineEnvelope>,
    taskEngine: TaskEngine,
    logChannel: SendChannel<TaskEngineMessageToProcess>?,
    enginesNumber: Int
) = launch(Dispatchers.IO) {

    val taskInputChannel = Channel<TaskEngineMessageToProcess>()
    val taskResultsChannel = Channel<TaskEngineMessageToProcess>()

    // coroutine dedicated to pulsar message pulling
    launch(CoroutineName("task-engine-message-puller")) {
        while (isActive) {
            val message: Message<TaskEngineEnvelope> = taskEngineConsumer.receiveAsync().await()

            try {
                val envelope = readBinary(message.data, TaskEngineEnvelope.serializer())
                taskInputChannel.send(MessageToProcess(envelope.message(), message.messageId))
            } catch (e: Exception) {
                taskEngineConsumer.negativeAcknowledge(message.messageId)
                throw e
            }
        }
    }

    // coroutines dedicated to Task Engine
    repeat(enginesNumber) {
        launch(CoroutineName("task-engine-$it")) {
            for (messageToProcess in taskInputChannel) {
                try {
                    messageToProcess.output = taskEngine.handle(messageToProcess.message)
                } catch (e: Exception) {
                    messageToProcess.exception = e
                }
                taskResultsChannel.send(messageToProcess)
            }
        }
    }

    // coroutine dedicated to pulsar message acknowledging
    launch(CoroutineName("task-engine-message-acknowledger")) {
        for (messageToProcess in taskResultsChannel) {
            if (messageToProcess.exception == null) {
                taskEngineConsumer.acknowledgeAsync(messageToProcess.messageId).await()
            } else {
                taskEngineConsumer.negativeAcknowledge(messageToProcess.messageId)
            }
            logChannel?.send(messageToProcess)
        }
    }
}

data class MessageToProcess<T> (
    val message: T,
    val messageId: MessageId,
    var exception: Exception? = null,
    var output: Any? = null
)

summary

In this article, we introduce how to use Pulsar implemented in Kotlin:

• code messages (including encapsulation of Pulsar Topic receiving various types of messages) • producers / consumers who create Pulsar Build a simple Worker that can process many messages in parallel.

Reference link

[1] Apache Pulsar: https://pulsar.apache.org/zh-CN/
[2] Kotlin: https://kotlinlang.org/
[3] Data class: https://kotlinlang.org/docs/reference/data-classes.html
[4] Coordination process: https://kotlinlang.org/docs/reference/coroutines/coroutines-guide.html
[5] No reflection serialization: https://kotlinlang.org/docs/reference/serialization.html#serialization
[6] Data class: https://kotlinlang.org/docs/reference/data-classes.html
[7] Pulsar producer: https://pulsar.apache.org/docs/en/client-libraries-java/#schema-example
[8] Kotlin serialization plug-in: https://github.com/Kotlin/kotlinx.serialization
[9] avro4k: https://github.com/avro-kotlin/avro4k
[10] PulsarSQL: https://pulsar.apache.org/docs/en/sql-overview/
[11] coroutines: https://kotlinlang.org/docs/reference/coroutines/coroutines-guide.html
[12] channels: https://kotlinlang.org/docs/reference/coroutines/channels.html

▼ follow "Apache Pulsar" to get more technical dry goods ▼

👇🏻 Join Apache Pulsar Chinese exchange group 👇🏻

This article is shared with WeChat official account ApachePulsar (ApachePulsar).
In case of infringement, please contact support@oschina.cn Delete.
Article participation“ OSC source creation plan ”, you who are reading are welcome to join us and share with us.

Topics: Java Apache github kotlin pulsar