No space left on device

Apache's JIRA Issue Tracker | David Arthur | 4 years ago
  1. 0

    The disk that ZooKeeper was using filled up. During a snapshot write, I got the following exception 2013-01-16 03:11:14,098 - ERROR [SyncThread:0:SyncRequestProcessor@151] - Severe unrecoverable error, exiting No space left on device at Method) at at at at org.apache.zookeeper.server.persistence.FileTxnLog.commit( at org.apache.zookeeper.server.persistence.FileTxnSnapLog.commit( at org.apache.zookeeper.server.ZKDatabase.commit( at org.apache.zookeeper.server.SyncRequestProcessor.flush( at Then many subsequent exceptions like: 2013-01-16 15:02:23,984 - ERROR [main:Util@239] - Last transaction was partial. 2013-01-16 15:02:23,985 - ERROR [main:ZooKeeperServerMain@63] - Unexpected exception, exiting abnormally at at org.apache.jute.BinaryInputArchive.readInt( at org.apache.zookeeper.server.persistence.FileHeader.deserialize( at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.inStreamCreated( at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.createInputArchive( at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.goToNextLog( at org.apache.zookeeper.server.persistence.FileTxnLog$ at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.init( at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.<init>( at at org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore( at org.apache.zookeeper.server.ZKDatabase.loadDataBase( at org.apache.zookeeper.server.ZooKeeperServer.loadData( at org.apache.zookeeper.server.ZooKeeperServer.startdata( at org.apache.zookeeper.server.NIOServerCnxnFactory.startup( at org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig( at org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun( at org.apache.zookeeper.server.ZooKeeperServerMain.main( at org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun( at org.apache.zookeeper.server.quorum.QuorumPeerMain.main( It seems to me that writing the transaction log should be fully atomic to avoid such situations. Is this not the case?

    Apache's JIRA Issue Tracker | 4 years ago | David Arthur No space left on device
  2. 0

    AppScale startup hangs when there is no disk space left

    GitHub | 4 years ago | jovanchohan No space left on device
  3. 0

    ZooKeeper cluster completely stalls with *no* transactions making progress when a storage related error (such as *ENOSPC, EDQUOT, EIO*) is encountered by the current *leader*. Surprisingly, the same errors in some circumstances cause the node to completely crash and therefore allowing other nodes in the cluster to become the leader and make progress with transactions. Interestingly, the same errors if encountered while initializing a new log file causes the current leader to go to weird state (but does not crash) where it thinks it is the leader (and so does not allow others to become the leader). *This causes the entire cluster to freeze. * Here is the stacktrace of the leader: ------------------------------------------------ 2016-07-11 15:42:27,502 [myid:3] - INFO [SyncThread:3:FileTxnLog@199] - Creating new log file: log.200000001 2016-07-11 15:42:27,505 [myid:3] - ERROR [SyncThread:3:ZooKeeperCriticalThread@49] - Severe unrecoverable error, from thread : SyncThread:3 Disk quota exceeded at Method) at at at at org.apache.zookeeper.server.persistence.FileTxnLog.append( at org.apache.zookeeper.server.persistence.FileTxnSnapLog.append( at org.apache.zookeeper.server.ZKDatabase.append( at ------------------------------------------------ From the trace and the code, it looks like the problem happens only when a new log file is initialized and only when there are errors in two cases: 1. Error during the append of *log header*. 2. Error during *padding zero bytes to the end of the log*. If similar errors happen when writing some other blocks of data, then the node just completely crashes allowing others to be elected as a new leader. These two blocks of the newly created log file are special as they take a different error recovery code path -- the node does not completely crash but rather certain threads are killed but supposedly the quorum holding thread stays up thereby preventing others to become the new leader. This causes the other nodes to think that there is no problem with the leader but the cluster just becomes unavailable for any subsequent operations such as read/write.

    Apache's JIRA Issue Tracker | 3 months ago | Ramnatthan Alagappan Disk quota exceeded
  4. Speed up your debug routine!

    Automated exception search integrated into your IDE

  5. 0

    CI builds failing due to lack of space

    GitHub | 2 years ago | rgladwell No space left on device
  6. 0

    Hey, I've seen that this error has been around for some time and I hope that this description is complete and will be helpful in reproducing and fixing. System: a logback.xml with a file appender with prudent=true. the log path should be to a volume with little available space. Scenario: start writing to the log file. as soon as the space is depleted, errors start happening: 15:44:40,595 |-ERROR in c.q.l.c.recovery.ResilientFileOutputStream@1944673755 - IO failure while writing to file [/Volumes/TESTVOL/logs/my-log.2015-02-01.log] No space left on device at No space left on device at at Method) at at at at at at at at ch.qos.logback.core.recovery.ResilientOutputStreamBase.flush( [...] 15:44:51,064 |-INFO in c.q.l.c.recovery.ResilientFileOutputStream@1944673755 - Attempting to recover from IO failure on file [/Volumes/TESTVOL/logs/my-log.2015-02-01.log] 15:44:51,064 |-INFO in c.q.l.c.recovery.ResilientFileOutputStream@1944673755 - Recovered from IO failure on file [/Volumes/TESTVOL/logs/my-log.2015-02-01.log] 15:44:51,064 |-ERROR in ch.qos.logback.core.rolling.RollingFileAppender[MY_LOG] - IO failure in appender java.nio.channels.ClosedChannelException at java.nio.channels.ClosedChannelException at at and then: 15:44:51,069 |-WARN in ch.qos.logback.core.rolling.RollingFileAppender[MY_LOG] - Attempted to append to non started appender [MY_LOG]. Debugging: i have investigated this issue and found the culprit to be line 204 in FileAppender: finally { if (fileLock != null) { ---> fileLock.release(); } [...] the problem is that when the original IOException was thrown, the channel was closed as part of the attemptRecovery method in ResilientOutputStreamBase. the release will throw a ClosedChannelException if the file channel is closed. the appender is then set to started=false in OutputStreamAppender subAppend Method and stays this way until restarted. Fix suggestion: the easy fix here is changing the guard of the release: finally { if (fileLock != null && fileChannel.isOpen()) { fileLock.release(); } [...] this prevents the release from throwing the exception. for now, an easy mitigation (if possible) is to set prudent=false. hope this helps and the bug will be fixed. JIRA | 2 years ago | Nadav Wexler No space left on device

  1. rexgreenza 24 times, last 1 month ago
  2. abrazeneb 1 times, last 4 months ago
2 unregistered visitors
Not finding the right solution?
Take a tour to get the most out of Samebug.

Tired of useless tips?

Automated exception search integrated into your IDE

Root Cause Analysis


    No space left on device

  2. Java RT
    1. Method)
    4 frames
  3. Zookeeper
    1. org.apache.zookeeper.server.persistence.FileTxnLog.commit(
    2. org.apache.zookeeper.server.persistence.FileTxnSnapLog.commit(
    3. org.apache.zookeeper.server.ZKDatabase.commit(
    4. org.apache.zookeeper.server.SyncRequestProcessor.flush(
    5 frames