Actual combat: flink1 12 asynchronous IO access to external data - Mysql

Posted by FURQAN on Tue, 21 Dec 2021 04:23:48 +0100

WeChat official account: big data development and operation framework

Follow to learn more about big data. For official account, please leave a message.

If you think "big data development operation and maintenance architecture" is helpful to you, please forward it to the circle of friends

Copy from WeChat official account, some of the format is chaotic, it is recommended to go to the official account directly.

Combined with the official website and network materials, this paper explains Flink's asynchronous I/O API for accessing external data storage. For users who are not familiar with asynchronous or event driven programming, it is recommended to reserve some knowledge about Future and event driven programming.

Flink asynchronous IO official document address:

https://ci.apache.org/projects/flink/flink-docs-release-1.13/zh/docs/dev/datastream/operators/asyncio/

The general flow of our program is to receive the user data user (ID, userNum and age) from the socket stream, use the userNum field in the record data to query the userName in the user name dictionary table userCode of Mysql database, fill the queried userName field back into the user, and finally save the backfilled data into the userInfo table of Mysql database.

1. Table creation statement:

Corresponding DDL statements of userCode (dictionary table) and userInfo table (target table):

CREATE TABLE `userInfo` (
  `id` int(11) NOT NULL,
  `userNum` int(11) DEFAULT NULL,
  `userName` varchar(64) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
  `age` int(11) DEFAULT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci
CREATE TABLE `userCode` (
  `id` int(11) NOT NULL,
  `name` varchar(50) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci

2.pom.xml file, here Alibaba druid is introduced

<dependencies>
    <dependency>
      <groupId>org.apache.flink</groupId>
      <artifactId>flink-streaming-java_${scala.binary.version}</artifactId>
      <version>${flink.version}</version>
    </dependency>
    <dependency>
      <groupId>org.apache.flink</groupId>
      <artifactId>flink-java</artifactId>
      <version>${flink.version}</version>
    </dependency>
    <dependency>
      <groupId>org.apache.flink</groupId>
      <artifactId>flink-clients_${scala.binary.version}</artifactId>
      <version>${flink.version}</version>
    </dependency>
    <dependency>
      <groupId>org.apache.flink</groupId>
      <artifactId>flink-connector-kafka_${scala.binary.version}</artifactId>
      <version>${flink.version}</version>
    </dependency>
    <dependency>
      <groupId>mysql</groupId>
      <artifactId>mysql-connector-java</artifactId>
      <version>5.1.44</version>
    </dependency>
    <dependency>
      <groupId>com.alibaba</groupId>
      <artifactId>druid</artifactId>
      <version>1.1.20</version>
    </dependency>
  </dependencies>

3. The overall structure of the code is as follows:

Main function: FlinkMysqlAsyncIOMain

Asynchronous IO processing class, mainly used to query mysql:MysqlAsyncRichMapFunction

On the Sink side, output the result to mysql: MysqlSinkFunction

Database tool class: DBUtils

User entity class: user

package com.hadoop.ljs.flink112.study.asyncIO;

import org.apache.flink.streaming.api.datastream.AsyncDataStream;
import org.apache.flink.streaming.api.datastream.DataStream;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import java.util.concurrent.TimeUnit;
/**
 * @author: Created By lujisen
 * @company China JiNan
 * @date: 2021-08-18 14:50
 * @version: v1.0
 * @description: com.hadoop.ljs.flink112.study.asyncIO
 */
public class FlinkMysqlAsyncIOMain {
    public static void main(String[] args) throws Exception {

        int maxIORequestCount = 20; /*Maximum number of asynchronous requests*/
        StreamExecutionEnvironment senv = StreamExecutionEnvironment.getExecutionEnvironment();

        DataStream<String> socketDataStream = senv.socketTextStream("192.168.0.111", 9999);

        DataStream<User> userDataStream = AsyncDataStream.orderedWait(
                socketDataStream,
                new MysqlAsyncRichMapFunction(),   //Custom Mysql asynchronous processing class
                500000,                      //Asynchronous timeout
                TimeUnit.MILLISECONDS,             //Time unit
                maxIORequestCount                  //Maximum number of asynchronous concurrent requests
        );
        userDataStream.print();
        userDataStream.addSink(new MysqlSinkFunction<User>());
       /* userDataStream.print();*/
        senv.execute("FlinkMysqlAsyncIOMain");
    }
}
package com.hadoop.ljs.flink112.study.asyncIO;
import com.alibaba.druid.pool.DruidDataSource;
import java.sql.Connection;
import java.sql.PreparedStatement;
import org.apache.flink.api.java.tuple.Tuple2;
import org.apache.flink.configuration.Configuration;
import org.apache.flink.streaming.api.functions.async.ResultFuture;
import org.apache.flink.streaming.api.functions.async.RichAsyncFunction;
import java.sql.ResultSet;
import java.sql.SQLException;
import java.util.Collections;
import java.util.concurrent.*;
import java.util.function.Supplier;
/**
 * @author: Created By lujisen
 * @company China JiNan
 * @date: 2021-08-18 10:25
 * @version: v1.0
 * @description: com.hadoop.ljs.flink112.study.asyncIO
 */
public class MysqlAsyncRichMapFunction extends RichAsyncFunction<String,User> {
    /** The database client that can use the callback function to send requests concurrently, plus transient, will not be serialized */
    /** Create thread pool and Mysql connection pool */
    @Override
    public void open(Configuration parameters) throws Exception {
       super.open(parameters);
    }

    @Override
    public void asyncInvoke(String line, ResultFuture<User> resultFuture) throws Exception {
        String[] split = line.split(",");
        User user = new User();
        user.setId(Integer.valueOf(split[0]));
        user.setUserNum(Integer.valueOf(split[1]));
        user.setAge(Integer.valueOf(split[2]));

        Future<String> dbResult = DBUtils.executorService.submit(new Callable<String>() {
            @Override
            public String call() throws Exception {
                ResultSet resultSet=null;
                PreparedStatement statement=null;
                String sql = "SELECT id, name FROM userCode WHERE id = ?";
                String userName=null;
                if(user.getUserNum()==1001){
                    System.out.println("current getUserNum: "+user.getUserNum()+"Start sleeping for 30 seconds!!!");
                    Thread.sleep(30000);
                }

                try {
                    statement = DBUtils.getConnection().prepareStatement(sql);
                    statement.setInt(1,user.getUserNum());

                    resultSet = statement.executeQuery();
                    while (resultSet.next()) {
                        userName= resultSet.getString("name");
                    }
                } finally {
                    if (resultSet != null) {
                        resultSet.close();
                    }
                    if (statement != null) {
                        statement.close();
                    }
                }
                return userName;
            }
        });
        CompletableFuture.supplyAsync(new Supplier<String>() {
            @Override
            public String get() {
                try {
                    return dbResult.get();
                } catch (InterruptedException | ExecutionException e) {
                    // Handle exceptions explicitly.
                    return null;
                }
            }
        }).thenAccept( (String userName) -> {
            user.setUserName(userName);
            resultFuture.complete(Collections.singleton(user));
        });
    }

    @Override
    public void close() throws Exception {
        super.close();
    }
}
package com.hadoop.ljs.flink112.study.asyncIO;
import org.apache.flink.configuration.Configuration;
import org.apache.flink.streaming.api.functions.sink.RichSinkFunction;
import java.sql.PreparedStatement;
/**
 * @author: Created By lujisen
 * @company China JiNan
 * @date: 2021-08-18 15:02
 * @version: v1.0
 * @description: com.hadoop.ljs.flink112.study.asyncIO
 */
public class MysqlSinkFunction<U> extends RichSinkFunction<User> {
    private static final String UPSERT_CASE = "INSERT INTO userInfo(id,userNum,userName,age) VALUES (?, ?,?,?)";
    private PreparedStatement statement=null;
    @Override
    public void open(Configuration parameters) throws Exception {
        super.open(parameters);
        statement = DBUtils.getConnection().prepareStatement(UPSERT_CASE);
    }
        @Override
    public void invoke(User user, Context context) throws Exception {
        statement.setInt(1,user.getId());
        statement.setInt(2, user.userNum);
        statement.setString(3, user.getUserName());
        statement.setInt(4, user.getAge());
        statement.addBatch();
        statement.executeBatch();
    }

    @Override
    public void close() throws Exception {
        super.close();
        if(statement!=null){
            statement.close();
        }
    }
}
package com.hadoop.ljs.flink112.study.asyncIO;

import com.alibaba.druid.pool.DruidDataSource;
import java.sql.Connection;
import java.sql.SQLException;
import java.util.concurrent.ExecutorService;
import static java.util.concurrent.Executors.newFixedThreadPool;
/**
 * @author: Created By lujisen
 * @company China JiNan
 * @date: 2021-08-18 15:19
 * @version: v1.0
 * @description: com.hadoop.ljs.flink112.study.asyncIO
 */
public class DBUtils {

    private static String jdbcUrl = "jdbc:mysql://192.168.0.111:3306/lujisen?characterEncoding=utf8";
    private static String username = "root";
    private static String password = "123456a?";
    private static String driverName = "com.mysql.jdbc.Driver";
    public static Connection connection=null;
    public static transient ExecutorService executorService = null;
    public static transient DruidDataSource dataSource = null;
    /*Connection pool maximum threads*/
    private static int maxPoolConn=20;
    /*initiate static*/
    static {
        //Create thread pool
        executorService = newFixedThreadPool(maxPoolConn);
        dataSource=new DruidDataSource();

        dataSource.setDriverClassName(driverName);
        dataSource.setUsername(username);
        dataSource.setPassword(password);
        dataSource.setUrl(jdbcUrl);
        dataSource.setMaxActive(maxPoolConn);
    }

    public static Connection getConnection() throws SQLException {
        if(connection==null){
            connection= dataSource.getConnection();
        }
        return connection;
    }
}
package com.hadoop.ljs.flink112.study.asyncIO;
/**
 * @author: Created By lujisen
 * @company China JiNan
 * @date: 2021-08-18 10:13
 * @version: v1.0
 * @description: com.hadoop.ljs.flink112.study.asyncIO
 */
public class User {

    int id;
    int userNum;
    String userName;
    int age;
    public User() {
    }

    public User(int id, int userNum, String userName, int age) {
        this.id = id;
        this.userNum = userNum;
        this.userName = userName;
        this.age = age;
    }

    public int getId() {
        return id;
    }

    public void setId(int id) {
        this.id = id;
    }

    public int getUserNum() {
        return userNum;
    }

    public void setUserNum(int userNum) {
        this.userNum = userNum;
    }

    public String getUserName() {
        return userName;
    }

    public void setUserName(String userName) {
        this.userName = userName;
    }

    public int getAge() {
        return age;
    }

    public void setAge(int age) {
        this.age = age;
    }

    @Override
    public String toString() {
        return "User{" +
                "id=" + id +
                ", userNum=" + userNum +
                ", userName='" + userName + '\'' +
                ", age=" + age +
                '}';
    }
}

Note: the above multi table data can be ignored here, and only the next two data can be sent. The Flink asynchronous IO timeout is set to 50s in the Main function, while the asyncInvoke function processing logic of MysqlAsyncRichMapFunction file. When the data id is 1, the process sleeps for 30 seconds, the first data blocks execution, and the second data does not block, Directly process the second data and write it into the MySQL database. After 30 seconds, the program continues to process the first data. You can observe the MySQL data table userInfo. You can see that the second data is stored first and the first data is stored after 30 seconds.

1,1001,23  Article 1 data
2,1002,24  Article 2 data

Topics: Big Data flink