Maxwell "Couldn't find database mysql" Troubleshooting

Posted by lhale on Mon, 07 Mar 2022 23:16:34 +0100

1, issue description

Recently, a mysql binlog data extraction maxwell process in prod environment was monitored by prometheus. There was a problem in its operation. The monitoring showed that the process hung up frequently.
Because the maxwell instance service is under systemct service control, it will automatically pull up the program 20 seconds after it hangs up.

The reason is that it is speculated that the maxwell process can not be started due to failure, and is pulled up by systemct service again and again. It is monitored by prometheus as frequent restart.

By viewing the service log, you can see the following error reports:

[root@server-xx system]# journalctl -xef


20:11:39,959 INFO  TaskManager - Stopping: com.zendesk.maxwell.schema.PositionStoreThread@6a827077
20:11:39,959 INFO  StoppableTaskState - com.zendesk.maxwell.schema.PositionStoreThread requestStop() called (in state: RUNNING)
20:11:39,959 INFO  TaskManager - Stopping: com.zendesk.maxwell.producer.MaxwellKafkaProducerWorker@565055db
20:11:39,959 INFO  StoppableTaskState - MaxwellKafkaProducerWorker requestStop() called (in state: RUNNING)
20:11:39,960 INFO  KafkaProducer - [Producer clientId=producer-1] Closing the Kafka producer with timeoutMillis = 9223372036854775807 ms.
20:11:39,964 INFO  TaskManager - Stopping: com.zendesk.maxwell.bootstrap.BootstrapController@d021375
20:11:39,964 INFO  StoppableTaskState - com.zendesk.maxwell.bootstrap.BootstrapController requestStop() called (in state: RUNNING)
20:11:39,964 INFO  TaskManager - Stopping: com.zendesk.maxwell.replication.BinlogConnectorReplicator@368831cf
20:11:39,964 INFO  StoppableTaskState - com.zendesk.maxwell.replication.BinlogConnectorReplicator requestStop() called (in state: STOPPED)
java.lang.RuntimeException: Couldn't find database mysql
	at com.zendesk.maxwell.replication.TableCache.processEvent(TableCache.java:28)
	at com.zendesk.maxwell.replication.BinlogConnectorReplicator.getTransactionRows(BinlogConnectorReplicator.java:486)
	at com.zendesk.maxwell.replication.BinlogConnectorReplicator.getRow(BinlogConnectorReplicator.java:592)
	at com.zendesk.maxwell.replication.BinlogConnectorReplicator.work(BinlogConnectorReplicator.java:175)
	at com.zendesk.maxwell.util.RunLoopProcess.runLoop(RunLoopProcess.java:34)
	at com.zendesk.maxwell.Maxwell.startInner(Maxwell.java:222)
	at com.zendesk.maxwell.Maxwell.start(Maxwell.java:156)
	at com.zendesk.maxwell.Maxwell.main(Maxwell.java:243)

2, maxwell configuration related

1. maxwell profile

my_biz_db1_2_kafka.properties

producer=kafka

host=${MAXWELL_META_DB}
user=maxwell
password=${PASSWORD}
port=3306

client_id=my_biz_1
replica_server_id=1307

replication_host=${BIZ_1_DB}
replication_user=david_test
replication_password=${PASSWORD}
replication_port=3307
jdbc_options=useSSL=false&serverTimezone=Asia/Shanghai

#### Set white list filtering method 1
#filter=exclude: *.*, include: saas_v1.*

#### Set white list filtering method 2
exclude_dbs=*
include_dbs=biz_db_1,biz_db_2
include_tables=db_1_table_1,db_1_table_2,db_2_table_1,db_2_table_2,db_2_table_3

kafka.bootstrap.servers=KFK_SERVER_1:9092,KFK_SERVER_2:9092,KFK_SERVER_3:9092
kafka_topic=${KFK_TOPIC_NAME}
kafka_partition_hash=murmur3
producer_partition_by=primary_key

2. Command start
/opt/module/maxwell/maxwell-1.22.1/bin/maxwell --config /opt/module/maxwell/maxwell-1.22.1/my_custom_config/my_biz_db1_2_kafka.properties

3. Encapsulated in services
The path is / etc/systemd/system and the service name is maxwell_my_biz_db1_2_kafka.service
Administrative commands:

systemctl ( start | stop | restart | status | enable |  is-enabled | disable ) maxwell_biz_1.service

3, Troubleshooting approach

Phase I

Seeing the error report, I initially thought whether it was bigdata, the user of mysql binlog monitored by maxwell_ Is admin not authorized to access mysql? The DBA inquired that the user really can't access the business mysql database, so they negotiated with the DBA to help bigdata_ Access rights granted by admin to mysql database:

# Therefore, bigdata was opened before_ Admin is the extraction account of binlog library. The password has been set. Just execute this command here
grant all privileges on `mysql`.* to 'bigdata_admin'@'%';

#
grant all privileges on `mysql`.* to 'bigdata_admin '@'%' identified by 'xxxx';

After setting, restart the maxwell process. The problem remains and the process still fails to start.

Phase II

mysql> insert into databases (schema_id,name,charset) values(230,'mysql','utf8');

insert into tables (schema_id,database_id,name,charset,pk) values(230,61,'user','utf8','Host,User');

Phase 3: reproduce the bug in the test environment

The following sql statements will not trigger the generation of binlog logs:

REVOKE INDEX ON *.* FROM 'user_test'@'%';
GRANT INDEX ON *.* TO 'user_test'@'%';
CREATE USER 'user_test'@'%' IDENTIFIED BY 'xxxx';

If you execute the following statements, or perform similar operations in the gui of navicat:

UPDATE `mysql`.`user` SET `Select_priv`='Y' WHERE (`Host`='%') AND (`User`='user_test') LIMIT 1

Will directly trigger the maxwell process to hang up.

Ways of issue recurrence:

1. Account liuwei_test does not have access to mysql;
2. The mysql database is not monitored in the databases list of maxwell;
3. The mysql profile in the mysql maxwell library does not listen.
4. The upstream makes a curd operation on the user (this will trigger the modification of the user table of the mysql Library).
especially:
UPDATE mysql.user SET Select_priv='Y' WHERE (Host='%') AND (User='user_test') LIMIT 1

4, Troubleshooting summary

4.1 for new maxwell instances

When big data uses maxwell to monitor a new database, the following steps must be strictly followed:

1. Give the business mysql connection user (such as bigdata_admin) the permission to access the mysql metabase in the business database.
2. Others can be deployed according to the normal maxwell deployment process.
3. Start the maxwell instance that monitors the binlog of the library, and you can listen to the mysql metabase in the business library in the maxwell metabase.

4.2 example of maxwell

For the maxwell process that has been started but does not listen to the mysql metabase in the business library, the following methods should be used to solve such problems:
Method 1:

1. (optional, not required) give the business mysql connection user (such as bigdata_admin) the permission to access the mysql metabase in the business database.;
2. In the maxwell metabase, add the monitoring of mysql metabase in the business library and all tables under the database.

Method 2:

1. (mandatory) give the business mysql connection user (such as bigdata_admin) the permission to access the mysql metabase in the business database.;
2. Select an appropriate time point (e.g. 23:45 of the property) and re brush the maxwell metabase. The specific operation steps are as follows:
A). Close all maxwell processes;
systemctl stop maxwell_biz_1.service
systemctl stop maxwell_biz_2.service
systemctl stop maxwell_biz_3.service
B). Log in to the mysql database specified by the host parameter in the maxwell configuration file and delete the maxwell metabase;
C). Restart all maxwell instance processes
systemctl start maxwell_biz_1.service
systemctl start maxwell_biz_2.service
systemctl start maxwell_biz_3.service

Topics: Big Data