Could not flush and close the file system output stream

Posted by foxden on Thu, 10 Oct 2019 06:09:17 +0200

A Flink program for Kafka data consumption, the Flinon Yarn model, was released in the test and production environments before. It was normal and had no problems. However, after restarting the test environment, it was redistributed again. The error was reported as follows:

2019-07-01 15:19:25,984 INFO  org.apache.flink.runtime.taskmanager.Task                     - Source: Custom Source -> Sink: Coupon Sink (1/1) (28578957b82c7fccd680cc4fb5fbb7cd) switched from RUNNING to FAILED.
AsynchronousException{java.lang.Exception: Could not materialize checkpoint 8 for operator Source: Custom Source -> Sink: Coupon Sink (1/1).}
	at org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointExceptionHandler.tryHandleCheckpointException(StreamTask.java:1153)
	at org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.handleExecutionException(StreamTask.java:947)
	at org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:884)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.Exception: Could not materialize checkpoint 8 for operator Source: Custom Source -> Sink: Coupon Sink (1/1).
	at org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.handleExecutionException(StreamTask.java:942)
	... 6 more
Caused by: java.util.concurrent.ExecutionException: java.io.IOException: Could not flush and close the file system output stream to hdfs://cxhadoop/flink/checkpoints/292e9f2140f8abc69acaadb99cfd4c58/chk-8/91154fad-3667-4dd3-9b1d-a503c0054207 in order to obtain the stream state handle
	at java.util.concurrent.FutureTask.report(FutureTask.java:122)
	at java.util.concurrent.FutureTask.get(FutureTask.java:192)
	at org.apache.flink.util.FutureUtil.runIfNotDoneAndGet(FutureUtil.java:53)
	at org.apache.flink.streaming.api.operators.OperatorSnapshotFinalizer.<init>(OperatorSnapshotFinalizer.java:53)
	at org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:853)
	... 5 more
Caused by: java.io.IOException: Could not flush and close the file system output stream to hdfs://cxhadoop/flink/checkpoints/292e9f2140f8abc69acaadb99cfd4c58/chk-8/91154fad-3667-4dd3-9b1d-a503c0054207 in order to obtain the stream state handle
	at org.apache.flink.runtime.state.filesystem.FsCheckpointStreamFactory$FsCheckpointStateOutputStream.closeAndGetHandle(FsCheckpointStreamFactory.java:326)
	at org.apache.flink.runtime.state.DefaultOperatorStateBackend$DefaultOperatorStateBackendSnapshotStrategy$1.callInternal(DefaultOperatorStateBackend.java:767)
	at org.apache.flink.runtime.state.DefaultOperatorStateBackend$DefaultOperatorStateBackendSnapshotStrategy$1.callInternal(DefaultOperatorStateBackend.java:696)
	at org.apache.flink.runtime.state.AsyncSnapshotCallable.call(AsyncSnapshotCallable.java:76)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at org.apache.flink.util.FutureUtil.runIfNotDoneAndGet(FutureUtil.java:50)
	... 7 more
Caused by: java.io.IOException: DataStreamer Exception: 
	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:695)
Caused by: java.lang.NoClassDefFoundError: Could not initialize class org.apache.hadoop.hdfs.protocol.HdfsConstants
	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:1413)
	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1357)
	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:587)

You can see from the log that there was a problem with the checkpoint file. Although there are many mistakes, there are two key problems.
Can not materialize checkpoint 8 for operator Source and Could not flush and close the file system output stream. The previous error is the cause of the latter error. It is not difficult to analyze that there is a problem in checkpoint file creation. Check the configuration of checkpoint in flink:

state.backend: filesystem

# Directory for checkpoints filesystem, when using any of the default bundled
# state backends.
#
state.checkpoints.dir: hdfs://cxhadoop/flink/checkpoints
state.checkpoints.num-retained: 20

# Default target directory for savepoints, optional.
#
state.savepoints.dir: hdfs://cxhadoop/flink/savepoints

There is an additional state.checkpoints.num-retained line configuration found, which is the maximum number of checkpoints retained in the checkpoint directory. If this configuration is exceeded, it cannot be created. Looking at the number of checkpoints in the directory, we found that this configuration was long overrun, which led to errors in the flink program. Every time you start the flink program, you can't create the checkpoint directory file properly, so you can comment out this configuration without restricting the number of reserved files.

Topics: Java Apache Hadoop kafka