java.io.IOException

There are no available Samebug tips for this exception. Do you have an idea how to solve this issue? A short tip would help users who saw this issue last week.

  • I am running a three node ZooKeeper cluster. Renames of acceptedEpoch.tmp to acceptedEpoch and currentEpoch.tmp to currentEpoch have to persisted to disk by explicitly issuing fsync on the parent directory. If not, the rename might not hit the disk immediately and if a crash occurs at this point, then the server would fail to start with the following error in the log. If this happens on two more or nodes, then the cluster can become unavailable. [myid:] - INFO [main:QuorumPeerConfig@103] - Reading configuration from: /tmp/zoo2.cfg [myid:] - INFO [main:QuorumPeer$QuorumServer@149] - Resolved hostname: 127.0.0.2 to address: /127.0.0.2 [myid:] - INFO [main:QuorumPeer$QuorumServer@149] - Resolved hostname: 127.0.0.4 to address: /127.0.0.4 [myid:] - INFO [main:QuorumPeer$QuorumServer@149] - Resolved hostname: 127.0.0.3 to address: /127.0.0.3 [myid:] - INFO [main:QuorumPeerConfig@331] - Defaulting to majority quorums [myid:1] - INFO [main:DatadirCleanupManager@78] - autopurge.snapRetainCount set to 3 [myid:1] - INFO [main:DatadirCleanupManager@79] - autopurge.purgeInterval set to 0 [myid:1] - INFO [main:DatadirCleanupManager@101] - Purge task is not scheduled. [myid:1] - INFO [main:QuorumPeerMain@127] - Starting quorum peer [myid:1] - INFO [main:NIOServerCnxnFactory@89] - binding to port 0.0.0.0/0.0.0.0:2182 [myid:1] - INFO [main:QuorumPeer@1019] - tickTime set to 2000 [myid:1] - INFO [main:QuorumPeer@1039] - minSessionTimeout set to -1 [myid:1] - INFO [main:QuorumPeer@1050] - maxSessionTimeout set to -1 [myid:1] - INFO [main:QuorumPeer@1065] - initLimit set to 5 [myid:1] - INFO [main:FileSnap@83] - Reading snapshot /run/shm/dice-4636/113-98-129-z_majority_RO_OM_0=60_1=55/rdir-0/version-2/snapshot.100000002 [myid:1] - ERROR [main:QuorumPeer@557] - Unable to load database on disk java.io.IOException: The accepted epoch, 1 is less than the current epoch, 2 at org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:554) at org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:500) at org.apache.zookeeper.server.quorum.QuorumPeerMain.runFromConfig(QuorumPeerMain.java:153) at org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:111) at org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78) 2016-04-15 03:24:57,144 [myid:1] - ERROR [main:QuorumPeerMain@89] - Unexpected exception, exiting abnormally java.lang.RuntimeException: Unable to run quorum server at org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:558) at org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:500) at org.apache.zookeeper.server.quorum.QuorumPeerMain.runFromConfig(QuorumPeerMain.java:153) at org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:111) at org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78) Caused by: java.io.IOException: The accepted epoch, 1 is less than the current epoch, 2 at org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:554) ... 4 more Similarly, when new log file is created, the parent directory needs be explicitly fsynced to persist the log file. Otherwise a data loss might be possible (We have reproduced the above issues). Please see this: https://www.quora.com/Linux/When-should-you-fsync-the-containing-directory-in-addition-to-the-file-itself and http://research.cs.wisc.edu/wind/Publications/alice-osdi14.pdf.
    via by Ramnatthan Alagappan,
  • I have zookeeper running normally just fine in a 3-server cluster. Then I try to configure zookeeper to use Kerberos, following docs in the Solr wiki here: https://cwiki.apache.org/confluence/display/solr/Kerberos+Authentication+Plugin I can't even get to the fun Kerberos errors. When I start with {{JVMFLAGS="-Djava.security.auth.login.config=/opt/zookeeper/jaas-server.conf"}} and this jaas-server.conf: {code} Server { com.sun.security.auth.module.Krb5LoginModule required useKeyTab=true keyTab=/keytabs/vdev-solr-01.keytab storeKey=true doNotPrompt=true useTicketCache=false debug=true principal="HTTP/<snip>"; } {code} I get this in the log: {code} 2016-02-10 16:16:51,327 [myid:1] - ERROR [main:ServerCnxnFactory@195] - No JAAS configuration section named 'Server' was foundin '/opt/zookeeper/jaas-server.conf'. 2016-02-10 16:16:51,328 [myid:1] - ERROR [main:QuorumPeerMain@89] - Unexpected exception, exiting abnormally java.io.IOException: No JAAS configuration section named 'Server' was foundin '/opt/zookeeper/jaas-server.conf'. at org.apache.zookeeper.server.ServerCnxnFactory.configureSaslLogin(ServerCnxnFactory.java:196) at org.apache.zookeeper.server.NIOServerCnxnFactory.configure(NIOServerCnxnFactory.java:87) at org.apache.zookeeper.server.quorum.QuorumPeerMain.runFromConfig(QuorumPeerMain.java:130) at org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:111) at org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78) {code} (Note the "foundin" typo.) I get the exact same error if the jaas-server.conf file exists, or does not. So later I found that the Solr wiki was wrong and lost the double quotes around the keytab value. It would be nice if Zookeeper spewed a more useful message when it can't parse the configuration.
    via by Dan Fitch,
  • I am running a three node ZooKeeper cluster. When a new log file is created by ZooKeeper, I see the following sequence of system calls: 1. creat(new_log) 2. write(new_log, count=16) // This is a log header I believe/ 3. truncate(new_log, from 16 bytes to 16 KBytes) // I have configured the log size to be 16K. When the above sequence of operations complete, it is reasonable to expect the newly created log file to contain the header(16 bytes) and then filled with zeros till the end of the log. But when a crash occurs (due to a power failure), while the truncate system call is in progress, it is possible for the log to contain garbage data when the system restarts from the crash. Note that if the crash occurs just after the truncate system call completes, then there is no problem. Basically, the truncate needs to be atomically persisted for ZooKeeper to recover from crashes correctly or (more realistically) the recovery code needs to deal with the case of expecting garbage in a newly created log. As mentioned, if a crash occurs during the truncate system call, then ZooKeeper will fail to start with the following exception. Here is the stack trace: java.io.IOException: Unreasonable length = -295704495 at org.apache.jute.BinaryInputArchive.checkLength(BinaryInputArchive.java:127) at org.apache.jute.BinaryInputArchive.readBuffer(BinaryInputArchive.java:92) at org.apache.zookeeper.server.persistence.Util.readTxnBytes(Util.java:233) at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:625) at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:652) at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.init(FileTxnLog.java:552) at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.<init>(FileTxnLog.java:527) at org.apache.zookeeper.server.persistence.FileTxnLog.read(FileTxnLog.java:354) at org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:132) at org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223) at org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:510) at org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:500) at org.apache.zookeeper.server.quorum.QuorumPeerMain.runFromConfig(QuorumPeerMain.java:153) at org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:111) at org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78) [myid:1] - ERROR [main:QuorumPeerMain@89] - Unexpected exception, exiting abnormally java.lang.RuntimeException: Unable to run quorum server at org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:558) at org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:500) at org.apache.zookeeper.server.quorum.QuorumPeerMain.runFromConfig(QuorumPeerMain.java:153) at org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:111) at org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78) Caused by: java.io.IOException: Unreasonable length = -295704495 at org.apache.jute.BinaryInputArchive.checkLength(BinaryInputArchive.java:127) at org.apache.jute.BinaryInputArchive.readBuffer(BinaryInputArchive.java:92) at org.apache.zookeeper.server.persistence.Util.readTxnBytes(Util.java:233) at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:625) at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:652) at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.init(FileTxnLog.java:552) at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.<init>(FileTxnLog.java:527) at org.apache.zookeeper.server.persistence.FileTxnLog.read(FileTxnLog.java:354) at org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:132) at org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223) at org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:510) ... 4 more Next, it is possible for two nodes of a 3-node ZooKeeper cluster to reach the same state. In that case, they both will fail to startup, rendering the entire cluster unavailable.
    via by Ramnatthan Alagappan,
  • Possible Cluster Unvailability I am running a three node ZooKeeper cluster. Each node runs Linux. I see the below sequence of system calls when ZooKeeper appends a user data item to the log file. 1 write("/data/version-2/log.200000001", offset=65, count=12) 2 write("/data/version-2/log.200000001", offset=77, count=16323) 3 write("/data/version-2/log.200000001", offset=16400, count=4209) 4 write("/data/version-2/log.200000001", offset=20609, count=1) 5 fdatasync("/data//version-2/log.200000001") Now, a crash could happen just after operation 4 but before the final fdatasync. In this situation, the file system could persist the 4th operation and fail to persist the 3rd operation because of the crash and there is fsync in between them. In such cases, ZooKeeper server fails to start with the following messages in its log file: [myid:] - INFO [main:QuorumPeerConfig@103] - Reading configuration from: /tmp/zoo2.cfg [myid:] - INFO [main:QuorumPeer$QuorumServer@149] - Resolved hostname: 127.0.0.2 to address: /127.0.0.2 [myid:] - INFO [main:QuorumPeer$QuorumServer@149] - Resolved hostname: 127.0.0.4 to address: /127.0.0.4 [myid:] - INFO [main:QuorumPeer$QuorumServer@149] - Resolved hostname: 127.0.0.3 to address: /127.0.0.3 [myid:] - INFO [main:QuorumPeerConfig@331] - Defaulting to majority quorums [myid:1] - INFO [main:DatadirCleanupManager@78] - autopurge.snapRetainCount set to 3 [myid:1] - INFO [main:DatadirCleanupManager@79] - autopurge.purgeInterval set to 0 [myid:1] - INFO [main:DatadirCleanupManager@101] - Purge task is not scheduled. [myid:1] - INFO [main:QuorumPeerMain@127] - Starting quorum peer [myid:1] - INFO [main:NIOServerCnxnFactory@89] - binding to port 0.0.0.0/0.0.0.0:2182 [myid:1] - INFO [main:QuorumPeer@1019] - tickTime set to 2000 [myid:1] - INFO [main:QuorumPeer@1039] - minSessionTimeout set to -1 [myid:1] - INFO [main:QuorumPeer@1050] - maxSessionTimeout set to -1 [myid:1] - INFO [main:QuorumPeer@1065] - initLimit set to 5 [myid:1] - INFO [main:FileSnap@83] - Reading snapshot /data/version-2/snapshot.100000002 [myid:1] - ERROR [main:QuorumPeer@557] - Unable to load database on disk java.io.IOException: CRC check failed at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:635) at org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:158) at org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223) at org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:510) at org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:500) at org.apache.zookeeper.server.quorum.QuorumPeerMain.runFromConfig(QuorumPeerMain.java:153) at org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:111) at org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78) 2016-04-15 04:00:32,795 [myid:1] - ERROR [main:QuorumPeerMain@89] - Unexpected exception, exiting abnormally java.lang.RuntimeException: Unable to run quorum server at org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:558) at org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:500) at org.apache.zookeeper.server.quorum.QuorumPeerMain.runFromConfig(QuorumPeerMain.java:153) at org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:111) at org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78) Caused by: java.io.IOException: CRC check failed at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:635) at org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:158) at org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223) at org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:510) ... 4 more The same happens when the 3rd and 4th writes hit the disk but the 2nd operation does not. Now, two nodes of a three node cluster can easily reach this state, rendering the entire cluster unavailable. ZooKeeper, on recovery should be able to handle such checksum mismatches gracefully to maintain cluster availability.
    via by Ramnatthan Alagappan,
  • Possible Cluster Unvailability I am running a three node ZooKeeper cluster. Each node runs Linux. I see the below sequence of system calls when ZooKeeper appends a user data item to the log file. 1 write("/data/version-2/log.200000001", offset=65, count=12) 2 write("/data/version-2/log.200000001", offset=77, count=16323) 3 write("/data/version-2/log.200000001", offset=16400, count=4209) 4 write("/data/version-2/log.200000001", offset=20609, count=1) 5 fdatasync("/data//version-2/log.200000001") Now, a crash could happen just after operation 4 but before the final fdatasync. In this situation, the file system could persist the 4th operation and fail to persist the 3rd operation because of the crash and there is fsync in between them. In such cases, ZooKeeper server fails to start with the following messages in its log file: [myid:] - INFO [main:QuorumPeerConfig@103] - Reading configuration from: /tmp/zoo2.cfg [myid:] - INFO [main:QuorumPeer$QuorumServer@149] - Resolved hostname: 127.0.0.2 to address: /127.0.0.2 [myid:] - INFO [main:QuorumPeer$QuorumServer@149] - Resolved hostname: 127.0.0.4 to address: /127.0.0.4 [myid:] - INFO [main:QuorumPeer$QuorumServer@149] - Resolved hostname: 127.0.0.3 to address: /127.0.0.3 [myid:] - INFO [main:QuorumPeerConfig@331] - Defaulting to majority quorums [myid:1] - INFO [main:DatadirCleanupManager@78] - autopurge.snapRetainCount set to 3 [myid:1] - INFO [main:DatadirCleanupManager@79] - autopurge.purgeInterval set to 0 [myid:1] - INFO [main:DatadirCleanupManager@101] - Purge task is not scheduled. [myid:1] - INFO [main:QuorumPeerMain@127] - Starting quorum peer [myid:1] - INFO [main:NIOServerCnxnFactory@89] - binding to port 0.0.0.0/0.0.0.0:2182 [myid:1] - INFO [main:QuorumPeer@1019] - tickTime set to 2000 [myid:1] - INFO [main:QuorumPeer@1039] - minSessionTimeout set to -1 [myid:1] - INFO [main:QuorumPeer@1050] - maxSessionTimeout set to -1 [myid:1] - INFO [main:QuorumPeer@1065] - initLimit set to 5 [myid:1] - INFO [main:FileSnap@83] - Reading snapshot /data/version-2/snapshot.100000002 [myid:1] - ERROR [main:QuorumPeer@557] - Unable to load database on disk java.io.IOException: CRC check failed at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:635) at org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:158) at org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223) at org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:510) at org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:500) at org.apache.zookeeper.server.quorum.QuorumPeerMain.runFromConfig(QuorumPeerMain.java:153) at org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:111) at org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78) 2016-04-15 04:00:32,795 [myid:1] - ERROR [main:QuorumPeerMain@89] - Unexpected exception, exiting abnormally java.lang.RuntimeException: Unable to run quorum server at org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:558) at org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:500) at org.apache.zookeeper.server.quorum.QuorumPeerMain.runFromConfig(QuorumPeerMain.java:153) at org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:111) at org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78) Caused by: java.io.IOException: CRC check failed at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:635) at org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:158) at org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223) at org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:510) ... 4 more The same happens when the 3rd and 4th writes hit the disk but the 2nd operation does not. Now, two nodes of a three node cluster can easily reach this state, rendering the entire cluster unavailable. ZooKeeper, on recovery should be able to handle such checksum mismatches gracefully to maintain cluster availability.
    via by Athyab Ameer,
  • Hi all. After shutting zk down and upgrading to centos 7, ZK would not start with exception Removing file: Dec 19, 2016 10:55:08 PM /hedvig/hpod/log/version-2/log.300ee0308 Removing file: Dec 19, 2016 7:11:23 PM /hedvig/hpod/data/version-2/snapshot.300ee0307 java.lang.RuntimeException: Unable to run quorum server at org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:558) at org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:500) at org.apache.zookeeper.server.quorum.QuorumPeerMain.runFromConfig(QuorumPeerMain.java:153) at com.hedvig.hpod.service.PodnetService$1.run(PodnetService.java:2262) at java.lang.Thread.run(Thread.java:745) Caused by: java.io.IOException: The current epoch, 3, is older than the last zxid, 17179871862 at org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:539) ... 4 more All logs are empty, and the following snapshot and commit logs exist find . . ./log ./log/version-2 ./log/version-2/log.40000010a ./log/version-2/log.300ef712b ./log/version-2/log.300f0659e ./log/version-2/.ignore ./data ./data/version-2 ./data/version-2/snapshot.400000109 ./data/version-2/currentEpoch ./data/version-2/acceptedEpoch ./data/version-2/snapshot.300ef712a ./data/version-2/snapshot.300f0659d ./data/myid.bak ./data/myid On other nodes we had the same exception but no commit log deletion. java.lang.RuntimeException: Unable to run quorum server at org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:558) at org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:500) at org.apache.zookeeper.server.quorum.QuorumPeerMain.runFromConfig(QuorumPeerMain.java:153) at com.hedvig.hpod.service.PodnetService$1.run(PodnetService.java:2262) at java.lang.Thread.run(Thread.java:745) Caused by: java.io.IOException: The current epoch, 3, is older than the last zxid, 17179871862 at org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:539) ./log ./log/version-2 ./log/version-2/log.300f06cfc ./log/version-2/log.300f03890 ./log/version-2/.ignore ./data ./data/version-2 ./data/version-2/snapshot.300f06cfb ./data/version-2/snapshot.300f06f10 ./data/version-2/currentEpoch ./data/version-2/acceptedEpoch ./data/version-2/snapshot.300f0388f ./data/myid.bak ./data/myid /log ./log/version-2 ./log/version-2/log.300f06dbf ./log/version-2/log.300ed96fc ./log/version-2/log.300ef1048 ./log/version-2/.ignore ./data ./data/version-2 ./data/version-2/snapshot.300f06dbe ./data/version-2/currentEpoch ./data/version-2/acceptedEpoch ./data/version-2/snapshot.300ed96fb ./data/version-2/snapshot.300ef1048 ./data/myid.bak ./data/myid The symptoms look like ZOOKEEPER-1549, but we are running 3.4.9 here. Any ideas?
    via by Lasaro Camargos,
  • Please ignore. I tested against the wrong version of ZooKeeper and this was resolved by ZOOKEEPER-1653 -We have noticed on internal executions of the integration tests rare failures of org.apache.zookeeper.test.CnxManagerTest.testWorkerThreads.- {code} java.lang.RuntimeException: Unable to run quorum server at org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:565) at org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:520) at org.apache.zookeeper.test.CnxManagerTest.testWorkerThreads(CnxManagerTest.java:328) at org.apache.zookeeper.JUnit4ZKTestRunner$LoggedInvokeMethod.evaluate(JUnit4ZKTestRunner.java:52) Caused by: java.io.IOException: The current epoch, 0, is older than the last zxid, 4294967296 at org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:546) {code} -along with this strange stack trace in the logs:- {code} java.nio.channels.ClosedByInterruptException at java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:202) at sun.nio.ch.FileChannelImpl.force(FileChannelImpl.java:380) at org.apache.zookeeper.common.AtomicFileOutputStream.close(AtomicFileOutputStream.java:71) at org.apache.zookeeper.server.quorum.QuorumPeer.writeLongToFile(QuorumPeer.java:1232) at org.apache.zookeeper.server.quorum.QuorumPeer.setCurrentEpoch(QuorumPeer.java:1253) at org.apache.zookeeper.server.quorum.Learner.syncWithLeader(Learner.java:412) at org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:83) at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:851) {code} -It appears that this failure is related to the usage of {{((FileOutputStream) out).getChannel().force(true)}} in {{AtomicFileOutputStream}}. {{FileChannel#force}} appears to be interruptible, which is not desirable behavior when writing the epoch file. The interrupt may be triggered by the repeated starting and shutting down of quorum peers in {{testWorkerThreads}}. Branch 3.5 uses {{FileDescriptor#sync}} which is not interruptible and does not appear to have the same problem.- -I was able to find another JIRA ticket describing a similar issue here:' https://issues.apache.org/jira/browse/DERBY-4963- -There is also interesting discussion in ZOOKEEPER-1835 (where the change was made for 3.5) although these discussions appear to be Windows centric (we noticed the issue on Linux)-https://issues.apache.org/jira/browse/ZOOKEEPER-1835 -The failure appears to have popped up on "ZOOKEEPER-2297 PreCommit Build #3241" but jenkins cleared out the logs (I only still have the test report from the mailing list).- -In addition, {{testWorkerThreads}} appears to be failing every few months on Solaris on Apache Jenkins (for 3.4 ZooKeeper_branch34_solaris - Build # 1430 and 3.5 ZooKeeper_branch35_solaris - Build # 387), but at the time I wrote this Jenkins had cleaned out the logs from the latest failed run so I have no way of determining if the cause is the same.-
    via by Abraham Fine,
    • java.io.IOException: The accepted epoch, 1 is less than the current epoch, 2 at org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:554) at org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:500) at org.apache.zookeeper.server.quorum.QuorumPeerMain.runFromConfig(QuorumPeerMain.java:153) at org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:111) at org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)

    Users with the same issue

    Unknown visitor
    Unknown visitor1 times, last one,
    Unknown visitor
    Unknown visitor1 times, last one,
    Unknown visitor
    Unknown visitor1 times, last one,
    Unknown visitor
    Unknown visitor1 times, last one,