java.lang.RuntimeException: Unable to run quorum server

Apache's JIRA Issue Tracker | Athyab Ameer | 3 months ago
  1. 0

    Possible Cluster Unvailability I am running a three node ZooKeeper cluster. Each node runs Linux. I see the below sequence of system calls when ZooKeeper appends a user data item to the log file. 1 write("/data/version-2/log.200000001", offset=65, count=12) 2 write("/data/version-2/log.200000001", offset=77, count=16323) 3 write("/data/version-2/log.200000001", offset=16400, count=4209) 4 write("/data/version-2/log.200000001", offset=20609, count=1) 5 fdatasync("/data//version-2/log.200000001") Now, a crash could happen just after operation 4 but before the final fdatasync. In this situation, the file system could persist the 4th operation and fail to persist the 3rd operation because of the crash and there is fsync in between them. In such cases, ZooKeeper server fails to start with the following messages in its log file: [myid:] - INFO [main:QuorumPeerConfig@103] - Reading configuration from: /tmp/zoo2.cfg [myid:] - INFO [main:QuorumPeer$QuorumServer@149] - Resolved hostname: 127.0.0.2 to address: /127.0.0.2 [myid:] - INFO [main:QuorumPeer$QuorumServer@149] - Resolved hostname: 127.0.0.4 to address: /127.0.0.4 [myid:] - INFO [main:QuorumPeer$QuorumServer@149] - Resolved hostname: 127.0.0.3 to address: /127.0.0.3 [myid:] - INFO [main:QuorumPeerConfig@331] - Defaulting to majority quorums [myid:1] - INFO [main:DatadirCleanupManager@78] - autopurge.snapRetainCount set to 3 [myid:1] - INFO [main:DatadirCleanupManager@79] - autopurge.purgeInterval set to 0 [myid:1] - INFO [main:DatadirCleanupManager@101] - Purge task is not scheduled. [myid:1] - INFO [main:QuorumPeerMain@127] - Starting quorum peer [myid:1] - INFO [main:NIOServerCnxnFactory@89] - binding to port 0.0.0.0/0.0.0.0:2182 [myid:1] - INFO [main:QuorumPeer@1019] - tickTime set to 2000 [myid:1] - INFO [main:QuorumPeer@1039] - minSessionTimeout set to -1 [myid:1] - INFO [main:QuorumPeer@1050] - maxSessionTimeout set to -1 [myid:1] - INFO [main:QuorumPeer@1065] - initLimit set to 5 [myid:1] - INFO [main:FileSnap@83] - Reading snapshot /data/version-2/snapshot.100000002 [myid:1] - ERROR [main:QuorumPeer@557] - Unable to load database on disk java.io.IOException: CRC check failed at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:635) at org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:158) at org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223) at org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:510) at org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:500) at org.apache.zookeeper.server.quorum.QuorumPeerMain.runFromConfig(QuorumPeerMain.java:153) at org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:111) at org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78) 2016-04-15 04:00:32,795 [myid:1] - ERROR [main:QuorumPeerMain@89] - Unexpected exception, exiting abnormally java.lang.RuntimeException: Unable to run quorum server at org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:558) at org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:500) at org.apache.zookeeper.server.quorum.QuorumPeerMain.runFromConfig(QuorumPeerMain.java:153) at org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:111) at org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78) Caused by: java.io.IOException: CRC check failed at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:635) at org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:158) at org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223) at org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:510) ... 4 more The same happens when the 3rd and 4th writes hit the disk but the 2nd operation does not. Now, two nodes of a three node cluster can easily reach this state, rendering the entire cluster unavailable. ZooKeeper, on recovery should be able to handle such checksum mismatches gracefully to maintain cluster availability.

    Apache's JIRA Issue Tracker | 3 months ago | Athyab Ameer
    java.lang.RuntimeException: Unable to run quorum server
  2. 0

    Possible Cluster Unvailability I am running a three node ZooKeeper cluster. Each node runs Linux. I see the below sequence of system calls when ZooKeeper appends a user data item to the log file. 1 write("/data/version-2/log.200000001", offset=65, count=12) 2 write("/data/version-2/log.200000001", offset=77, count=16323) 3 write("/data/version-2/log.200000001", offset=16400, count=4209) 4 write("/data/version-2/log.200000001", offset=20609, count=1) 5 fdatasync("/data//version-2/log.200000001") Now, a crash could happen just after operation 4 but before the final fdatasync. In this situation, the file system could persist the 4th operation and fail to persist the 3rd operation because of the crash and there is fsync in between them. In such cases, ZooKeeper server fails to start with the following messages in its log file: [myid:] - INFO [main:QuorumPeerConfig@103] - Reading configuration from: /tmp/zoo2.cfg [myid:] - INFO [main:QuorumPeer$QuorumServer@149] - Resolved hostname: 127.0.0.2 to address: /127.0.0.2 [myid:] - INFO [main:QuorumPeer$QuorumServer@149] - Resolved hostname: 127.0.0.4 to address: /127.0.0.4 [myid:] - INFO [main:QuorumPeer$QuorumServer@149] - Resolved hostname: 127.0.0.3 to address: /127.0.0.3 [myid:] - INFO [main:QuorumPeerConfig@331] - Defaulting to majority quorums [myid:1] - INFO [main:DatadirCleanupManager@78] - autopurge.snapRetainCount set to 3 [myid:1] - INFO [main:DatadirCleanupManager@79] - autopurge.purgeInterval set to 0 [myid:1] - INFO [main:DatadirCleanupManager@101] - Purge task is not scheduled. [myid:1] - INFO [main:QuorumPeerMain@127] - Starting quorum peer [myid:1] - INFO [main:NIOServerCnxnFactory@89] - binding to port 0.0.0.0/0.0.0.0:2182 [myid:1] - INFO [main:QuorumPeer@1019] - tickTime set to 2000 [myid:1] - INFO [main:QuorumPeer@1039] - minSessionTimeout set to -1 [myid:1] - INFO [main:QuorumPeer@1050] - maxSessionTimeout set to -1 [myid:1] - INFO [main:QuorumPeer@1065] - initLimit set to 5 [myid:1] - INFO [main:FileSnap@83] - Reading snapshot /data/version-2/snapshot.100000002 [myid:1] - ERROR [main:QuorumPeer@557] - Unable to load database on disk java.io.IOException: CRC check failed at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:635) at org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:158) at org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223) at org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:510) at org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:500) at org.apache.zookeeper.server.quorum.QuorumPeerMain.runFromConfig(QuorumPeerMain.java:153) at org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:111) at org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78) 2016-04-15 04:00:32,795 [myid:1] - ERROR [main:QuorumPeerMain@89] - Unexpected exception, exiting abnormally java.lang.RuntimeException: Unable to run quorum server at org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:558) at org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:500) at org.apache.zookeeper.server.quorum.QuorumPeerMain.runFromConfig(QuorumPeerMain.java:153) at org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:111) at org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78) Caused by: java.io.IOException: CRC check failed at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:635) at org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:158) at org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223) at org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:510) ... 4 more The same happens when the 3rd and 4th writes hit the disk but the 2nd operation does not. Now, two nodes of a three node cluster can easily reach this state, rendering the entire cluster unavailable. ZooKeeper, on recovery should be able to handle such checksum mismatches gracefully to maintain cluster availability.

    Apache's JIRA Issue Tracker | 3 months ago | Ramnatthan Alagappan
    java.lang.RuntimeException: Unable to run quorum server
  3. 0

    Unable to start Zookeeper with 2 servers in the quorum

    Stack Overflow | 3 years ago | dipteshc
    java.lang.RuntimeException: My id 11111111111 not in the peer list
  4. Speed up your debug routine!

    Automated exception search integrated into your IDE

  5. 0

    zookeeper server error: My id 4 not in the peer list

    Stack Overflow | 3 years ago | jaksky
    java.lang.RuntimeException: My id 4 not in the peer list at org.apache.zookeeper.server.quorum.QuorumPeer.startLeaderElection(QuorumPeer.java:479)
  6. 0

    Unable to view namespaces, add namespace servers, etc. in DFS Managment | Windows Server - File Services and Storage

    solutionscore.com | 1 year ago
    java.lang.RuntimeException: My id 11111111111 not in the peer list

    Not finding the right solution?
    Take a tour to get the most out of Samebug.

    Tired of useless tips?

    Automated exception search integrated into your IDE

    Root Cause Analysis

    1. java.io.IOException

      CRC check failed

      at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next()
    2. Zookeeper
      QuorumPeerMain.main
      1. org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:635)
      2. org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:158)
      3. org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223)
      4. org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:510)
      5. org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:500)
      6. org.apache.zookeeper.server.quorum.QuorumPeerMain.runFromConfig(QuorumPeerMain.java:153)
      7. org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:111)
      8. org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)
      8 frames