Clickhouse load balancing client BalancedClickhouseDataSource source code analysis

Posted by rsnell on Sat, 29 Jan 2022 10:21:31 +0100

Source code analysis of BalancedClickhouseDataSource

The full path of the BalancedClickhouseDataSource is Ru yandex. clickhouse. BalancedClickhouseDataSource. The source code mainly includes three parts: construction method, obtaining connection, and generating available address list.

BalancedClickhouseDataSource implements javax sql. Datasource. In the parameter, allUrls is the address list passed in from the construction method, and enabledUrls is the available address list.

public class BalancedClickhouseDataSource implements javax.sql.DataSource {
    private final ThreadLocal<Random> randomThreadLocal = new ThreadLocal<Random>();
    private final List<String> allUrls;
    private volatile List<String> enabledUrls;
}

There are several construction methods of BalancedClickhouseDataSource, but the final call is BalancedClickhouseDataSource(final List urls, ClickHouseProperties properties). For example, JDBC: C lickhouse://10.170.4.81:8123 , 10.170.4.82:8123, 10.170.4.83:8123, 10.170.4.84:8123 / datasets. If multiple addresses are configured in this way, segmentation will be performed first. Split imaging JDBC: C lickhouse://10.170.4.81:8123/datasets ,jdbc:c lickhouse://10.170.4.82:8123/datasets Multiple addresses.

public BalancedClickhouseDataSource(final String url, Properties properties) {
        this(splitUrl(url), new ClickHouseProperties(properties));
    }
    
    static List<String> splitUrl(final String url) {
        Matcher m = URL_TEMPLATE.matcher(url);
        if (!m.matches()) {
            throw new IllegalArgumentException("Incorrect url");
        }
        String database = m.group(2);
        if (database == null) {
            database = "";
        }
        String[] hosts = m.group(1).split(",");
        final List<String> result = new ArrayList<String>(hosts.length);
        for (final String host : hosts) {
            result.add(JDBC_CLICKHOUSE_PREFIX + "//" + host + database);
        }
        return result;
    }

    private BalancedClickhouseDataSource(final List<String> urls, ClickHouseProperties properties) {
        if (urls.isEmpty()) {
            throw new IllegalArgumentException("Incorrect ClickHouse jdbc url list. It must be not empty");
        }

        try {
            ClickHouseProperties localProperties = ClickhouseJdbcUrlParser.parse(urls.get(0), properties.asProperties());
            localProperties.setHost(null);
            localProperties.setPort(-1);

            this.properties = localProperties;
        } catch (URISyntaxException e) {
            throw new IllegalArgumentException(e);
        }


        List<String> allUrls = new ArrayList<String>(urls.size());
        for (final String url : urls) {
            try {
                if (driver.acceptsURL(url)) {
                    allUrls.add(url);
                } else {
                    log.error("that url is has not correct format: {}", url);
                }
            } catch (SQLException e) {
                throw new IllegalArgumentException("error while checking url: " + url, e);
            }
        }

        if (allUrls.isEmpty()) {
            throw new IllegalArgumentException("there are no correct urls");
        }

        this.allUrls = Collections.unmodifiableList(allUrls);
        this.enabledUrls = this.allUrls;
    }

After initialization, the getConnection() method will be provided to obtain the connection. When obtaining the connection, an available connection will be randomly obtained from the enabledUrls available list through the getAnyUrl() method.

   @Override
    public ClickHouseConnection getConnection() throws SQLException {
        return driver.connect(getAnyUrl(), properties);
    }
    
    private String getAnyUrl() throws SQLException {
        List<String> localEnabledUrls = enabledUrls;
        if (localEnabledUrls.isEmpty()) {
            throw new SQLException("Unable to get connection: there are no enabled urls");
        }
        Random random = this.randomThreadLocal.get();
        if (random == null) {
            this.randomThreadLocal.set(new Random());
            random = this.randomThreadLocal.get();
        }

        int index = random.nextInt(localEnabledUrls.size());
        return localEnabledUrls.get(index);
    }

Finally, let's talk about the acquisition of the available address list. The scheduleactivation () method will start a thread to call the activate () method regularly to detect the available list. When using the actualize() method, check whether the node is available by executing SELECT query SELECT 1.

/**
     * set time period for checking availability connections
     *
     * @param delay    value for time unit
     * @param timeUnit time unit for checking
     * @return this datasource with changed settings
     */
    public BalancedClickhouseDataSource scheduleActualization(int delay, TimeUnit timeUnit) {
        ClickHouseDriver.ScheduledConnectionCleaner.INSTANCE.scheduleWithFixedDelay(new Runnable() {
            @Override
            public void run() {
                try {
                    actualize();
                } catch (Exception e) {
                    log.error("Unable to actualize urls", e);
                }
            }
        }, 0, delay, timeUnit);

        return this;
    }
    
    /**
     * Checks if clickhouse on url is alive, if it isn't, disable url, else enable.
     *
     * @return number of avaliable clickhouse urls
     */
    public synchronized int actualize() {
        List<String> enabledUrls = new ArrayList<String>(allUrls.size());

        for (String url : allUrls) {
            log.debug("Pinging disabled url: {}", url);
            if (ping(url)) {
                log.debug("Url is alive now: {}", url);
                enabledUrls.add(url);
            } else {
                log.debug("Url is dead now: {}", url);
            }
        }

        this.enabledUrls = Collections.unmodifiableList(enabledUrls);
        return enabledUrls.size();
    }
    
    private boolean ping(final String url) {
        try {
            driver.connect(url, properties).createStatement().execute("SELECT 1");
            return true;
        } catch (Exception e) {
            return false;
        }
    }

conclusion

Clickhouse JDBC is a load balancing client Ru yandex. clickhouse. The balanced Clickhouse datasource is guaranteed by starting a thread in the background to detect the Clickhouse server regularly and generate the available address list. Then, when obtaining the connection, randomly select a node from the available address list to establish the connection.

However, the pitfall is that there is no place to call the scheduleimplementation method, that is, it must be called manually. Otherwise, even if you configure multiple addresses, if a node goes down, there is still a high probability that the connection will fail.

Finally, the balanced clickhousedatasource only ensures that the connection is available in most cases. According to the ping frequency and timeout time, there is always a short period of time when all addresses in the available address list can not be guaranteed to be available. Therefore, if you want to achieve failover and ensure high availability, you must also have the cooperation of the client. It is best to add a retry mechanism.

Topics: clickhouse