java.util.concurrent.ExecutionException: com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) tried for query failed (tried: /127.0.0.2:9042 (com.datastax.driver.core.TransportException: [/127.0.0.2:9042] Connection has been closed))

DataStax JIRA | Andy Tolbert | 2 years ago
  1. 0

    I'm encountering a case where the failure of futures returned from {{Session.executeAsync}} is dramatically delayed (over one minute). It seems this is caused by {{cluster.manager.executor}} becoming very backlogged when my hosts enter SUSPECT state and connections are closed while there are many inflight requests: {noformat} DEBUG [2015-01-28 22:15:24,923] [Hashed wheel timer #1] Error querying /127.0.0.1:9042, trying next host (error is: com.datastax.driver.core.TransportException: [/127.0.0.1:9042] Connection has been closed) {noformat} If connection cannot be re-established easily, the executor becomes very backlogged. The problem appears to come from {{RequestHandler.retry}} using the executor to retry a request and {{DCAwareRoundRobinPolicy.waitOnReconnection}} executing on each suspected host: {noformat:title=Retry task causing TIMED_WAITING in cluster.manager.executor thread} "Cassandra Java Driver worker-0" nid=36 state=TIMED_WAITING - waiting on <0x1b072aaf> (a com.google.common.util.concurrent.ListenableFutureTask) - locked <0x1b072aaf> (a com.google.common.util.concurrent.ListenableFutureTask) at sun.misc.Unsafe.park(Native Method) at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226) at java.util.concurrent.FutureTask.awaitDone(FutureTask.java:422) at java.util.concurrent.FutureTask.get(FutureTask.java:199) at com.datastax.driver.core.policies.DCAwareRoundRobinPolicy.waitOnReconnection(DCAwareRoundRobinPolicy.java:368) at com.datastax.driver.core.policies.DCAwareRoundRobinPolicy.access$100(DCAwareRoundRobinPolicy.java:56) at com.datastax.driver.core.policies.DCAwareRoundRobinPolicy$1.computeNext(DCAwareRoundRobinPolicy.java:310) at com.datastax.driver.core.policies.DCAwareRoundRobinPolicy$1.computeNext(DCAwareRoundRobinPolicy.java:279) at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143) at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138) at com.datastax.driver.core.policies.TokenAwarePolicy$1.computeNext(TokenAwarePolicy.java:157) at com.datastax.driver.core.policies.TokenAwarePolicy$1.computeNext(TokenAwarePolicy.java:142) at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143) at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138) at com.datastax.driver.core.RequestHandler.sendRequest(RequestHandler.java:102) at com.datastax.driver.core.RequestHandler$1.run(RequestHandler.java:179) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) - java.util.concurrent.ThreadPoolExecutor$Worker@e1e012a {noformat} {noformat:title=Example of future taking longer than a minute to complete} ERROR [2015-01-28 22:16:09,638] [Cassandra Java Driver worker-6] [Delete] Timed out request failed after 69495ms. java.util.concurrent.ExecutionException: com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) tried for query failed (tried: /127.0.0.2:9042 (com.datastax.driver.core.TransportException: [/127.0.0.2:9042] Connection has been closed)) at com.google.common.util.concurrent.AbstractFuture$Sync.getValue(AbstractFuture.java:306) at com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:293) at com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:116) ... at com.google.common.util.concurrent.MoreExecutors$SameThreadExecutorService.execute(MoreExecutors.java:293) at com.google.common.util.concurrent.ExecutionList$RunnableExecutorPair.execute(ExecutionList.java:150) at com.google.common.util.concurrent.ExecutionList.execute(ExecutionList.java:135) at com.google.common.util.concurrent.AbstractFuture.setException(AbstractFuture.java:203) at com.datastax.driver.core.DefaultResultSetFuture.onException(DefaultResultSetFuture.java:137) at com.datastax.driver.core.RequestHandler.setFinalException(RequestHandler.java:250) at com.datastax.driver.core.RequestHandler.sendRequest(RequestHandler.java:108) at com.datastax.driver.core.RequestHandler$1.run(RequestHandler.java:179) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) tried for query failed (tried: /127.0.0.2:9042 (com.datastax.driver.core.TransportException: [/127.0.0.2:9042] Connection has been closed)) ... 5 more {noformat} The intent of the executor is for processing non-blocking tasks, but this particular task can become blocking if the host is in a suspected state (and also depending on the implementation of the used LBP), therefore I think we should move this task (the work in {{RequestHandler.retry}}) to a different executor as this could cause delay of important work, like triggering up and down events. It is probably ok to delay setting an exception on the Future compared to other important work that uses the executor. I should note that as soon as I had a non-suspected host, these tasks completed quickly. Example showing backlogged executor: !executor_backlogged.png|thumbnail! The test scenario used to reproduce this: # Configure read timeout of 500ms, connection timeout of 100ms. # Send continual queries (reads, selects, deletes, writes, etc. - sent up to 2400 simultaneous requests on 3 local nodes) # Execute script that does {{kill -STOP <pid>}} and then {{kill -CONT <pid>}} repeatedly on cassandra nodes. ([^suspend_nodes.sh]) As it takes a non-ideal scenario to create this, though I suppose it is theoretically possible for this to happen without these parameters (many inflight requests while a connection is closing and all hosts are suspected), so the severity may be lower than 'Major'.

    DataStax JIRA | 2 years ago | Andy Tolbert
    java.util.concurrent.ExecutionException: com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) tried for query failed (tried: /127.0.0.2:9042 (com.datastax.driver.core.TransportException: [/127.0.0.2:9042] Connection has been closed))
  2. 0

    I'm encountering a case where the failure of futures returned from {{Session.executeAsync}} is dramatically delayed (over one minute). It seems this is caused by {{cluster.manager.executor}} becoming very backlogged when my hosts enter SUSPECT state and connections are closed while there are many inflight requests: {noformat} DEBUG [2015-01-28 22:15:24,923] [Hashed wheel timer #1] Error querying /127.0.0.1:9042, trying next host (error is: com.datastax.driver.core.TransportException: [/127.0.0.1:9042] Connection has been closed) {noformat} If connection cannot be re-established easily, the executor becomes very backlogged. The problem appears to come from {{RequestHandler.retry}} using the executor to retry a request and {{DCAwareRoundRobinPolicy.waitOnReconnection}} executing on each suspected host: {noformat:title=Retry task causing TIMED_WAITING in cluster.manager.executor thread} "Cassandra Java Driver worker-0" nid=36 state=TIMED_WAITING - waiting on <0x1b072aaf> (a com.google.common.util.concurrent.ListenableFutureTask) - locked <0x1b072aaf> (a com.google.common.util.concurrent.ListenableFutureTask) at sun.misc.Unsafe.park(Native Method) at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226) at java.util.concurrent.FutureTask.awaitDone(FutureTask.java:422) at java.util.concurrent.FutureTask.get(FutureTask.java:199) at com.datastax.driver.core.policies.DCAwareRoundRobinPolicy.waitOnReconnection(DCAwareRoundRobinPolicy.java:368) at com.datastax.driver.core.policies.DCAwareRoundRobinPolicy.access$100(DCAwareRoundRobinPolicy.java:56) at com.datastax.driver.core.policies.DCAwareRoundRobinPolicy$1.computeNext(DCAwareRoundRobinPolicy.java:310) at com.datastax.driver.core.policies.DCAwareRoundRobinPolicy$1.computeNext(DCAwareRoundRobinPolicy.java:279) at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143) at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138) at com.datastax.driver.core.policies.TokenAwarePolicy$1.computeNext(TokenAwarePolicy.java:157) at com.datastax.driver.core.policies.TokenAwarePolicy$1.computeNext(TokenAwarePolicy.java:142) at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143) at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138) at com.datastax.driver.core.RequestHandler.sendRequest(RequestHandler.java:102) at com.datastax.driver.core.RequestHandler$1.run(RequestHandler.java:179) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) - java.util.concurrent.ThreadPoolExecutor$Worker@e1e012a {noformat} {noformat:title=Example of future taking longer than a minute to complete} ERROR [2015-01-28 22:16:09,638] [Cassandra Java Driver worker-6] [Delete] Timed out request failed after 69495ms. java.util.concurrent.ExecutionException: com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) tried for query failed (tried: /127.0.0.2:9042 (com.datastax.driver.core.TransportException: [/127.0.0.2:9042] Connection has been closed)) at com.google.common.util.concurrent.AbstractFuture$Sync.getValue(AbstractFuture.java:306) at com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:293) at com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:116) ... at com.google.common.util.concurrent.MoreExecutors$SameThreadExecutorService.execute(MoreExecutors.java:293) at com.google.common.util.concurrent.ExecutionList$RunnableExecutorPair.execute(ExecutionList.java:150) at com.google.common.util.concurrent.ExecutionList.execute(ExecutionList.java:135) at com.google.common.util.concurrent.AbstractFuture.setException(AbstractFuture.java:203) at com.datastax.driver.core.DefaultResultSetFuture.onException(DefaultResultSetFuture.java:137) at com.datastax.driver.core.RequestHandler.setFinalException(RequestHandler.java:250) at com.datastax.driver.core.RequestHandler.sendRequest(RequestHandler.java:108) at com.datastax.driver.core.RequestHandler$1.run(RequestHandler.java:179) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) tried for query failed (tried: /127.0.0.2:9042 (com.datastax.driver.core.TransportException: [/127.0.0.2:9042] Connection has been closed)) ... 5 more {noformat} The intent of the executor is for processing non-blocking tasks, but this particular task can become blocking if the host is in a suspected state (and also depending on the implementation of the used LBP), therefore I think we should move this task (the work in {{RequestHandler.retry}}) to a different executor as this could cause delay of important work, like triggering up and down events. It is probably ok to delay setting an exception on the Future compared to other important work that uses the executor. I should note that as soon as I had a non-suspected host, these tasks completed quickly. Example showing backlogged executor: !executor_backlogged.png|thumbnail! The test scenario used to reproduce this: # Configure read timeout of 500ms, connection timeout of 100ms. # Send continual queries (reads, selects, deletes, writes, etc. - sent up to 2400 simultaneous requests on 3 local nodes) # Execute script that does {{kill -STOP <pid>}} and then {{kill -CONT <pid>}} repeatedly on cassandra nodes. ([^suspend_nodes.sh]) As it takes a non-ideal scenario to create this, though I suppose it is theoretically possible for this to happen without these parameters (many inflight requests while a connection is closing and all hosts are suspected), so the severity may be lower than 'Major'.

    DataStax JIRA | 2 years ago | Andy Tolbert
    java.util.concurrent.ExecutionException: com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) tried for query failed (tried: /127.0.0.2:9042 (com.datastax.driver.core.TransportException: [/127.0.0.2:9042] Connection has been closed))
  3. 0

    missing dep on genrule() claims its a buck bug

    GitHub | 4 years ago | spearce
    java.util.concurrent.ExecutionException: java.lang.RuntimeException: No dep named //:bin in ${:bin}>$OUT
  4. Speed up your debug routine!

    Automated exception search integrated into your IDE

  5. 0

    I can enter all information about my cloud, and it tests ok and accepts it. When I go to launch a new slave though, I get the following snippet: Stack trace javax.servlet.ServletException: java.lang.RuntimeException: org.jclouds.compute.RunNodesException: error running 1 node group(jenkins-slave) location(regionOne) image(cdaa7dff-8bf6-4dc7-a84a-2802e23c0c94) size(1) options({loginPrivateKeyPresent=true, scriptPresent=true, userMetadata={Name=jenkins-slave}, userData=[B@4a0d6539}) Execution failures: 1) ExecutionException on jenkins-slave-b77: java.util.concurrent.ExecutionException: org.jclouds.http.HttpResponseException: command: POST http://130.20.232.220:8774/v2/84ad49ccecf2478b9317b332c1490bd4/servers HTTP/1.1 failed with response: HTTP/1.1 400 null; content: [{"badRequest": {"message": "Multiple possible networks found, use a Network ID to be more specific.", "code": 400}}] at com.google.common.util.concurrent.AbstractFuture$Sync.getValue(AbstractFuture.java:306) at com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:293) at com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:116) at org.jclouds.concurrent.FutureIterables$1.run(FutureIterables.java:125) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) Caused by: org.jclouds.http.HttpResponseException: command: POST http://130.20.232.220:8774/v2/84ad49ccecf2478b9317b332c1490bd4/servers HTTP/1.1 failed with response: HTTP/1.1 400 null; content: [{"badRequest": {"message": "Multiple possible networks found, use a Network ID to be more specific.", "code": 400}}] ... There seems to be no way to specify a network_id, which is important to our setup.

    Jenkins JIRA | 3 years ago | Kevin Fox
    java.util.concurrent.ExecutionException: org.jclouds.http.HttpResponseException: command: POST http://130.20.232.220:8774/v2/84ad49ccecf2478b9317b332c1490bd4/servers HTTP/1.1 failed with response: HTTP/1.1 400 null; content: [{"badRequest": {"message": "Multiple possible networks found, use a Network ID to be more specific.", "code": 400}}]
  6. 0

    [JENKINS-21186] multiple network failure - Jenkins JIRA

    jenkins-ci.org | 8 months ago
    java.util.concurrent.ExecutionException: org.jclouds.http.HttpResponseException: command: POST HTTP/1.1 failed with response: HTTP/1.1 400 null; content:

  1. Agócs Tamás 2 times, last 6 months ago
2 unregistered visitors
Not finding the right solution?
Take a tour to get the most out of Samebug.

Tired of useless tips?

Automated exception search integrated into your IDE

Root Cause Analysis

  1. java.util.concurrent.ExecutionException

    com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) tried for query failed (tried: /127.0.0.2:9042 (com.datastax.driver.core.TransportException: [/127.0.0.2:9042] Connection has been closed))

    at com.google.common.util.concurrent.AbstractFuture$Sync.getValue()
  2. Guava
    AbstractFuture.get
    1. com.google.common.util.concurrent.AbstractFuture$Sync.getValue(AbstractFuture.java:306)
    2. com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:293)
    3. com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:116)
    3 frames