Custom metric monitors inflow and output

Posted by mrgym on Sat, 18 Dec 2021 19:53:39 +0100

Statement: this series of blogs is compiled according to SGG's videos, which is very suitable for you to learn. Some articles are collected by crawlers and other technical means in order to learn and share. If there is a copyright problem, please leave a message and delete it at any time.

The latest version of big data interview questions in 2021 is fully updated

The flick task itself provides various types of indicator monitoring, which is detailed to the inflow / outflow volume, rate, Watermark value, etc. of each Operator. Generally, in practical applications, the access data needs to be formatted. For example, when it is transferred to json, the data that meets the requirements will flow downward, and the data that does not meet the requirements or has abnormal formatting will be filtered out, Now the goal is to achieve a general way to make index statistics on normal data and dirty data.

Implementation idea:

1. The Flink metric type is divided into Counter, Gauge, Histogram and Meter. What needs to be counted is an accumulated value. Therefore, the Counter type metrirc is selected

2. Because it is the inflow monitoring of tasks, it needs to be processed at the Source end. Usually, the docked data Source is kafka, and the flyk itself has provided the kakfa connector, and opened the data deserialization interface deserialization schema and the abstract class abstractdeserialization schema. Implementing this interface or inheriting the abstract class can complete the data deserialization and formatting, Since each piece of data needs to be deserialized, index statistics can be performed while deserializing

3. In flink, the custom Metric entry is RuntimeContext, but there is no interface to access RuntimeContext in the deserialization abstract class. Generally, in the RichFunction, only FlinkKafkaConsumer is related to it, so you can pass the obtained RuntimeContext to AbstractDeserializationSchema in FlinkKafkaConsumer

Implementation steps:

1. Customize an abstract class AbsDeserialization that inherits abstractdeserialization schema, which contains RuntimeContext and two statistical counters, as well as a method initMetric to initialize the Counter

2. customize an abstract class that inherits FlinkKafkaConsumer010, which contains the AbsDeserialization attribute, the construction method, and rewrites the run method. In the run method, it sets the RuntimeContex object to AbsDeserialization and calls its initMetric, and finally calls the parent class run method.

The code is as follows:

public abstract class AbsDeserialization<T> extends AbstractDeserializationSchema<T> {

    private RuntimeContext runtimeContext;
    private String DIRTY_DATA_NAME="dirtyDataNum";
    private String NORMAL_DATA_NAME="normalDataNum";

    protected transient Counter dirtyDataNum;

    protected transient Counter normalDataNum;

    public RuntimeContext getRuntimeContext() {
        return runtimeContext;
    }

    public void setRuntimeContext(RuntimeContext runtimeContext) {
        this.runtimeContext = runtimeContext;
    }

    public void initMetric()
    {
        dirtyDataNum=runtimeContext.getMetricGroup().counter(DIRTY_DATA_NAME);
        normalDataNum=runtimeContext.getMetricGroup().counter(NORMAL_DATA_NAME);
    }

}
public class CustomerKafkaConsumer<T> extends FlinkKafkaConsumer010<T> {

    private AbsDeserialization<T> valueDeserializer;

    public CustomerKafkaConsumer(String topic, AbsDeserialization<T> valueDeserializer, Properties props) {
        super(topic, valueDeserializer, props);
        this.valueDeserializer=valueDeserializer;
    }

    @Override public void run(SourceContext<T> sourceContext) throws Exception {
        valueDeserializer.setRuntimeContext(getRuntimeContext());
        valueDeserializer.initMetric();
        super.run(sourceContext);
    }
}

For the use case, just define an inherited AbsDeserialization class,

class ParseDeserialization extends AbsDeserialization[RawData] {

  override def deserialize(message: Array[Byte]): RawData = {

    try {
      val msg = new String(message)
      val rawData = JSON.parseObject(msg, classOf[RawData])
      normalDataNum.inc() //Normal data index
      rawData
    } catch {
      case e:Exception=>{
        dirtyDataNum.inc()   //Dirty data index
        null
      }
    }
  }

}

source usage:

val consumer: CustomerKafkaConsumer[RawData] = new CustomerKafkaConsumer[RawData](topic, new ParseDeserialization, kafkaPro)

During task operation, you can view the two indicator values of normalDataNum and dirtyDataNum in the monitoring interface of flink web. In addition, you can also define some inflow rate monitoring in AbsDeserialization.

Topics: Interview crawler NLP flink