1, issue description
Recently, a mysql binlog data extraction maxwell process in prod environment was monitored by prometheus. There was a problem in its operation. The monitoring showed that the process hung up frequently.
Because the maxwell instance service is under systemct service control, it will automatically pull up the program 20 seconds after it hangs up.
The reason is that it is speculated that the maxwell process can not be started due to failure, and is pulled up by systemct service again and again. It is monitored by prometheus as frequent restart.
By viewing the service log, you can see the following error reports:
[root@server-xx system]# journalctl -xef 20:11:39,959 INFO TaskManager - Stopping: com.zendesk.maxwell.schema.PositionStoreThread@6a827077 20:11:39,959 INFO StoppableTaskState - com.zendesk.maxwell.schema.PositionStoreThread requestStop() called (in state: RUNNING) 20:11:39,959 INFO TaskManager - Stopping: com.zendesk.maxwell.producer.MaxwellKafkaProducerWorker@565055db 20:11:39,959 INFO StoppableTaskState - MaxwellKafkaProducerWorker requestStop() called (in state: RUNNING) 20:11:39,960 INFO KafkaProducer - [Producer clientId=producer-1] Closing the Kafka producer with timeoutMillis = 9223372036854775807 ms. 20:11:39,964 INFO TaskManager - Stopping: com.zendesk.maxwell.bootstrap.BootstrapController@d021375 20:11:39,964 INFO StoppableTaskState - com.zendesk.maxwell.bootstrap.BootstrapController requestStop() called (in state: RUNNING) 20:11:39,964 INFO TaskManager - Stopping: com.zendesk.maxwell.replication.BinlogConnectorReplicator@368831cf 20:11:39,964 INFO StoppableTaskState - com.zendesk.maxwell.replication.BinlogConnectorReplicator requestStop() called (in state: STOPPED) java.lang.RuntimeException: Couldn't find database mysql at com.zendesk.maxwell.replication.TableCache.processEvent(TableCache.java:28) at com.zendesk.maxwell.replication.BinlogConnectorReplicator.getTransactionRows(BinlogConnectorReplicator.java:486) at com.zendesk.maxwell.replication.BinlogConnectorReplicator.getRow(BinlogConnectorReplicator.java:592) at com.zendesk.maxwell.replication.BinlogConnectorReplicator.work(BinlogConnectorReplicator.java:175) at com.zendesk.maxwell.util.RunLoopProcess.runLoop(RunLoopProcess.java:34) at com.zendesk.maxwell.Maxwell.startInner(Maxwell.java:222) at com.zendesk.maxwell.Maxwell.start(Maxwell.java:156) at com.zendesk.maxwell.Maxwell.main(Maxwell.java:243)
2, maxwell configuration related
1. maxwell profile
my_biz_db1_2_kafka.properties
producer=kafka host=${MAXWELL_META_DB} user=maxwell password=${PASSWORD} port=3306 client_id=my_biz_1 replica_server_id=1307 replication_host=${BIZ_1_DB} replication_user=david_test replication_password=${PASSWORD} replication_port=3307 jdbc_options=useSSL=false&serverTimezone=Asia/Shanghai #### Set white list filtering method 1 #filter=exclude: *.*, include: saas_v1.* #### Set white list filtering method 2 exclude_dbs=* include_dbs=biz_db_1,biz_db_2 include_tables=db_1_table_1,db_1_table_2,db_2_table_1,db_2_table_2,db_2_table_3 kafka.bootstrap.servers=KFK_SERVER_1:9092,KFK_SERVER_2:9092,KFK_SERVER_3:9092 kafka_topic=${KFK_TOPIC_NAME} kafka_partition_hash=murmur3 producer_partition_by=primary_key
2. Command start
/opt/module/maxwell/maxwell-1.22.1/bin/maxwell --config /opt/module/maxwell/maxwell-1.22.1/my_custom_config/my_biz_db1_2_kafka.properties
3. Encapsulated in services
The path is / etc/systemd/system and the service name is maxwell_my_biz_db1_2_kafka.service
Administrative commands:
systemctl ( start | stop | restart | status | enable | is-enabled | disable ) maxwell_biz_1.service
3, Troubleshooting approach
Phase I
Seeing the error report, I initially thought whether it was bigdata, the user of mysql binlog monitored by maxwell_ Is admin not authorized to access mysql? The DBA inquired that the user really can't access the business mysql database, so they negotiated with the DBA to help bigdata_ Access rights granted by admin to mysql database:
# Therefore, bigdata was opened before_ Admin is the extraction account of binlog library. The password has been set. Just execute this command here grant all privileges on `mysql`.* to 'bigdata_admin'@'%'; # grant all privileges on `mysql`.* to 'bigdata_admin '@'%' identified by 'xxxx';
After setting, restart the maxwell process. The problem remains and the process still fails to start.
Phase II
mysql> insert into databases (schema_id,name,charset) values(230,'mysql','utf8');
insert into tables (schema_id,database_id,name,charset,pk) values(230,61,'user','utf8','Host,User');
Phase 3: reproduce the bug in the test environment
The following sql statements will not trigger the generation of binlog logs:
REVOKE INDEX ON *.* FROM 'user_test'@'%'; GRANT INDEX ON *.* TO 'user_test'@'%'; CREATE USER 'user_test'@'%' IDENTIFIED BY 'xxxx';
If you execute the following statements, or perform similar operations in the gui of navicat:
UPDATE `mysql`.`user` SET `Select_priv`='Y' WHERE (`Host`='%') AND (`User`='user_test') LIMIT 1
Will directly trigger the maxwell process to hang up.
Ways of issue recurrence:
1. Account liuwei_test does not have access to mysql;
2. The mysql database is not monitored in the databases list of maxwell;
3. The mysql profile in the mysql maxwell library does not listen.
4. The upstream makes a curd operation on the user (this will trigger the modification of the user table of the mysql Library).
especially:
UPDATE mysql.user SET Select_priv='Y' WHERE (Host='%') AND (User='user_test') LIMIT 1
4, Troubleshooting summary
4.1 for new maxwell instances
When big data uses maxwell to monitor a new database, the following steps must be strictly followed:
1. Give the business mysql connection user (such as bigdata_admin) the permission to access the mysql metabase in the business database.
2. Others can be deployed according to the normal maxwell deployment process.
3. Start the maxwell instance that monitors the binlog of the library, and you can listen to the mysql metabase in the business library in the maxwell metabase.
4.2 example of maxwell
For the maxwell process that has been started but does not listen to the mysql metabase in the business library, the following methods should be used to solve such problems:
Method 1:
1. (optional, not required) give the business mysql connection user (such as bigdata_admin) the permission to access the mysql metabase in the business database.;
2. In the maxwell metabase, add the monitoring of mysql metabase in the business library and all tables under the database.
Method 2:
1. (mandatory) give the business mysql connection user (such as bigdata_admin) the permission to access the mysql metabase in the business database.;
2. Select an appropriate time point (e.g. 23:45 of the property) and re brush the maxwell metabase. The specific operation steps are as follows:
A). Close all maxwell processes;
systemctl stop maxwell_biz_1.service
systemctl stop maxwell_biz_2.service
systemctl stop maxwell_biz_3.service
B). Log in to the mysql database specified by the host parameter in the maxwell configuration file and delete the maxwell metabase;
C). Restart all maxwell instance processes
systemctl start maxwell_biz_1.service
systemctl start maxwell_biz_2.service
systemctl start maxwell_biz_3.service