com.datastax.driver.core.exceptions.OperationTimedOutException: [localhost/127.0.0.1:9042] Timed out waiting for server response

DataStax JIRA | Sotirios Delimanolis | 4 months ago
  1. 0

    When trying to query multiple rows where the fetch size is smaller than the number of rows, the first page returns successfully, but the next stalls indefinitely until {{OperationTimedOutException}} is thrown and stops it. However, the {{Connection}} logs show that after the `OperationTimedOutException` is raised, the request unblocks and Cassandra returns that next page. Here's how I can reproduce it. I'm running this example on a single Cassandra 2.2.7 node locally (my laptop). Create a simple table with a clustering key {noformat} CREATE TABLE pillar.cluster (id text, rank text, val text, primary KEY (id,rank)) WITH CLUSTERING ORDER BY (rank DESC ); {noformat} Create a few rows for the same partition {noformat} INSERT INTO pillar.cluster (id, rank , val ) VALUES ( '1','1','asdas'); INSERT INTO pillar.cluster (id, rank , val ) VALUES ( '1','2','asdas'); INSERT INTO pillar.cluster (id, rank , val ) VALUES ( '1','3','asdas'); INSERT INTO pillar.cluster (id, rank , val ) VALUES ( '1','4','asdas'); INSERT INTO pillar.cluster (id, rank , val ) VALUES ( '1','5','asdas'); INSERT INTO pillar.cluster (id, rank , val ) VALUES ( '1','6','asdas'); INSERT INTO pillar.cluster (id, rank , val ) VALUES ( '1','7','asdas'); {noformat} And run this code {noformat} public class Example { public static void main(String[] args) throws Exception { QueryOptions options = new QueryOptions(); options.setFetchSize(2); Cluster cluster = Cluster.builder().addContactPoint("localhost").withQueryOptions(options).withProtocolVersion(ProtocolVersion.V3).build(); Session session = cluster.connect(); SimpleStatement bs = new SimpleStatement("SELECT * FROM pillar.cluster WHERE id = '1'"); ResultSetFuture resultSetFuture = session.executeAsync(bs); CountDownLatch countDownLatch = new CountDownLatch(1); Futures.addCallback(resultSetFuture, new FutureCallback<ResultSet>() { @Override public void onSuccess(ResultSet resultSet) { try { resultSet.all(); } finally { countDownLatch.countDown(); } } @Override public void onFailure(Throwable throwable) { countDownLatch.countDown(); } }); countDownLatch.await(); Thread.sleep(100); cluster.close(); } } {noformat} Here are the TRACE logs for {{com.datastax.driver.core}} starting at the first page request sent {noformat} 18:37:01.362 [main] TRACE com.datastax.driver.core.Connection - Connection[localhost/127.0.0.1:9042-2, inFlight=1, closed=false], stream 128, writing request QUERY SELECT * FROM pillar.cluster WHERE id = '1'([cl=LOCAL_ONE, positionalVals=[], namedVals={}, skip=false, psize=2, state=null, serialCl=SERIAL]) 18:37:01.363 [cluster1-nio-worker-1] TRACE com.datastax.driver.core.Connection - Connection[localhost/127.0.0.1:9042-2, inFlight=1, closed=false], stream 128, request sent successfully 18:37:01.366 [cluster1-nio-worker-1] TRACE com.datastax.driver.core.Connection - Connection[localhost/127.0.0.1:9042-2, inFlight=1, closed=false], stream 128, received: ROWS [id (varchar)][rank (varchar)][val (varchar)] (to be continued) | varchar '1' | varchar '7' | varchar 'asdas' | varchar '1' | varchar '6' | varchar 'asdas' --- 18:37:01.373 [cluster1-nio-worker-1] TRACE com.datastax.driver.core.Connection - Connection[localhost/127.0.0.1:9042-2, inFlight=1, closed=false], stream 192, writing request QUERY SELECT * FROM pillar.cluster WHERE id = '1'([cl=LOCAL_ONE, positionalVals=[], namedVals={}, skip=false, psize=2, state=java.nio.HeapByteBuffer[pos=0 lim=18 cap=18], serialCl=SERIAL]) 18:37:13.447 [cluster1-timeouter-0] WARN com.datastax.driver.core.RequestHandler - Not retrying statement because it is not idempotent (this message will be logged only once). Note that this version of the driver changes the default retry behavior for non-idempotent statements: they won't be automatically retried anymore. The driver marks statements non-idempotent by default, so you should explicitly call setIdempotent(true) if your statements are safe to retry. See http://goo.gl/4HrSby for more details. Aug 18, 2016 6:37:13 PM com.google.common.util.concurrent.AbstractFuture executeListener SEVERE: RuntimeException while executing runnable com.google.common.util.concurrent.Futures$6@152b494a with executor MoreExecutors.directExecutor() com.datastax.driver.core.exceptions.OperationTimedOutException: [localhost/127.0.0.1:9042] Timed out waiting for server response at com.datastax.driver.core.exceptions.OperationTimedOutException.copy(OperationTimedOutException.java:44) at com.datastax.driver.core.exceptions.OperationTimedOutException.copy(OperationTimedOutException.java:26) at com.datastax.driver.core.DriverThrowables.propagateCause(DriverThrowables.java:37) at com.datastax.driver.core.ArrayBackedResultSet$MultiPage.prepareNextRow(ArrayBackedResultSet.java:312) at com.datastax.driver.core.ArrayBackedResultSet$MultiPage.isExhausted(ArrayBackedResultSet.java:269) at com.datastax.driver.core.ArrayBackedResultSet$1.hasNext(ArrayBackedResultSet.java:143) at com.datastax.driver.core.ArrayBackedResultSet.all(ArrayBackedResultSet.java:132) at com.example.Example$1.onSuccess(Example.java:30) at com.example.Example$1.onSuccess(Example.java:1) at com.google.common.util.concurrent.Futures$6.run(Futures.java:1773) at com.google.common.util.concurrent.MoreExecutors$DirectExecutor.execute(MoreExecutors.java:456) at com.google.common.util.concurrent.AbstractFuture.executeListener(AbstractFuture.java:817) at com.google.common.util.concurrent.AbstractFuture.complete(AbstractFuture.java:753) at com.google.common.util.concurrent.AbstractFuture.set(AbstractFuture.java:613) at com.datastax.driver.core.DefaultResultSetFuture.onSet(DefaultResultSetFuture.java:174) at com.datastax.driver.core.RequestHandler.setFinalResult(RequestHandler.java:174) at com.datastax.driver.core.RequestHandler.access$2600(RequestHandler.java:43) at com.datastax.driver.core.RequestHandler$SpeculativeExecution.setFinalResult(RequestHandler.java:793) at com.datastax.driver.core.RequestHandler$SpeculativeExecution.onSet(RequestHandler.java:496) at com.datastax.driver.core.Connection$Dispatcher.channelRead0(Connection.java:1012) at com.datastax.driver.core.Connection$Dispatcher.channelRead0(Connection.java:935) at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:328) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:321) at io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:266) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:328) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:321) at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:102) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:328) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:321) at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:293) at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:267) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:328) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:321) at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1280) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:328) at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:890) at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:131) at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:564) at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:505) at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:419) at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:391) at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:112) at java.lang.Thread.run(Thread.java:745) Caused by: com.datastax.driver.core.exceptions.OperationTimedOutException: [localhost/127.0.0.1:9042] Timed out waiting for server response at com.datastax.driver.core.RequestHandler$SpeculativeExecution.onTimeout(RequestHandler.java:766) at com.datastax.driver.core.Connection$ResponseHandler$1.run(Connection.java:1267) at io.netty.util.HashedWheelTimer$HashedWheelTimeout.expire(HashedWheelTimer.java:581) at io.netty.util.HashedWheelTimer$HashedWheelBucket.expireTimeouts(HashedWheelTimer.java:655) at io.netty.util.HashedWheelTimer$Worker.run(HashedWheelTimer.java:367) ... 1 more 18:37:13.483 [cluster1-nio-worker-1] TRACE com.datastax.driver.core.Connection - Connection[localhost/127.0.0.1:9042-2, inFlight=0, closed=false], stream 192, request sent successfully 18:37:13.486 [cluster1-nio-worker-1] TRACE com.datastax.driver.core.Connection - Connection[localhost/127.0.0.1:9042-2, inFlight=0, closed=false], stream 192, received: ROWS [id (varchar)][rank (varchar)][val (varchar)] (to be continued) | varchar '1' | varchar '5' | varchar 'asdas' | varchar '1' | varchar '4' | varchar 'asdas' --- 18:37:13.486 [cluster1-nio-worker-1] DEBUG com.datastax.driver.core.Connection - Connection[localhost/127.0.0.1:9042-2, inFlight=0, closed=false] Response received on stream 192 but no handler set anymore (either the request has timed out or it was closed due to another error). Received message is ROWS [id (varchar)][rank (varchar)][val (varchar)] (to be continued) | varchar '1' | varchar '5' | varchar 'asdas' | varchar '1' | varchar '4' | varchar 'asdas' --- 18:37:13.554 [main] DEBUG com.datastax.driver.core.Connection - Connection[localhost/127.0.0.1:9042-1, inFlight=0, closed=true] closing connection 18:37:13.556 [main] DEBUG com.datastax.driver.core.Connection - Connection[localhost/127.0.0.1:9042-2, inFlight=0, closed=true] closing connection {noformat} You'll notice the driver shows that it sends the second request {noformat} 18:37:01.373 [cluster1-nio-worker-1] TRACE com.datastax.driver.core.Connection - Connection[localhost/127.0.0.1:9042-2, inFlight=1, closed=false], stream 192, writing request QUERY SELECT * FROM pillar.cluster WHERE id = '1'([cl=LOCAL_ONE, positionalVals=[], namedVals={}, skip=false, psize=2, state=java.nio.HeapByteBuffer[pos=0 lim=18 cap=18], serialCl=SERIAL]) {noformat} and after the exception occurs (12 seconds is the default read timeout in the {{SocketOptions}}), the request is successfully sent and the next page (fetch size of 2) is received. {noformat} 18:37:13.483 [cluster1-nio-worker-1] TRACE com.datastax.driver.core.Connection - Connection[localhost/127.0.0.1:9042-2, inFlight=0, closed=false], stream 192, request sent successfully {noformat} It doesn't matter how long the read timeout is. On the Cassandra side, TRACE logs for {{Message}} and {{org.apache.cassandra.service}} show me that it receives the second page 12 seconds later {noformat} TRACE [SharedPool-Worker-1] 2016-08-18 18:37:01,363 Message.java:506 - Received: QUERY SELECT * FROM pillar.cluster WHERE id = '1', v=3 DEBUG [SharedPool-Worker-1] 2016-08-18 18:37:01,364 SliceQueryPager.java:92 - Querying next page of slice query; new filter: SliceQueryFilter [reversed=false, slices=[[, ]], count=2, toGroup = 1] TRACE [SharedPool-Worker-1] 2016-08-18 18:37:01,364 ReadCallback.java:76 - Blockfor is 1; setting up requests to localhost/127.0.0.1 TRACE [SharedPool-Worker-1] 2016-08-18 18:37:01,364 AbstractReadExecutor.java:118 - reading data locally DEBUG [SharedPool-Worker-2] 2016-08-18 18:37:01,364 FileCacheService.java:102 - Evicting cold readers for /Users/selimanolis/git/cassandra/data/data/pillar/cluster-1d8e08d0659c11e689b71db7b20aab7e/lb-3-big-Data.db DEBUG [SharedPool-Worker-2] 2016-08-18 18:37:01,365 FileCacheService.java:102 - Evicting cold readers for /Users/selimanolis/git/cassandra/data/data/pillar/cluster-1d8e08d0659c11e689b71db7b20aab7e/lb-2-big-Data.db TRACE [SharedPool-Worker-2] 2016-08-18 18:37:01,365 FileCacheService.java:151 - Estimated memory usage is 393216 compared to actual usage 393216 TRACE [SharedPool-Worker-2] 2016-08-18 18:37:01,365 FileCacheService.java:151 - Estimated memory usage is 458752 compared to actual usage 458752 TRACE [SharedPool-Worker-1] 2016-08-18 18:37:01,365 StorageProxy.java:1449 - Read: 1 ms. DEBUG [SharedPool-Worker-1] 2016-08-18 18:37:01,365 AbstractQueryPager.java:95 - Fetched 2 live rows DEBUG [SharedPool-Worker-1] 2016-08-18 18:37:01,365 AbstractQueryPager.java:133 - Remaining rows to page: 2147483645 TRACE [SharedPool-Worker-1] 2016-08-18 18:37:01,366 Message.java:525 - Responding: ROWS [id(pillar, cluster), org.apache.cassandra.db.marshal.UTF8Type][rank(pillar, cluster), org.apache.cassandra.db.marshal.ReversedType(org.apache.cassandra.db.marshal.UTF8Type)][val(pillar, cluster), org.apache.cassandra.db.marshal.UTF8Type] (to be continued) | 1 | 7 | asdas | 1 | 6 | asdas ---, v=3 TRACE [SharedPool-Worker-1] 2016-08-18 18:37:13,483 Message.java:506 - Received: QUERY SELECT * FROM pillar.cluster WHERE id = '1', v=3 DEBUG [SharedPool-Worker-1] 2016-08-18 18:37:13,484 SliceQueryPager.java:92 - Querying next page of slice query; new filter: SliceQueryFilter [reversed=false, slices=[[00013600000376616c00, ]], count=3, toGroup = 1] TRACE [SharedPool-Worker-1] 2016-08-18 18:37:13,484 ReadCallback.java:76 - Blockfor is 1; setting up requests to localhost/127.0.0.1 TRACE [SharedPool-Worker-1] 2016-08-18 18:37:13,484 AbstractReadExecutor.java:118 - reading data locally DEBUG [SharedPool-Worker-2] 2016-08-18 18:37:13,484 FileCacheService.java:102 - Evicting cold readers for /Users/selimanolis/git/cassandra/data/data/pillar/cluster-1d8e08d0659c11e689b71db7b20aab7e/lb-3-big-Data.db DEBUG [SharedPool-Worker-2] 2016-08-18 18:37:13,485 FileCacheService.java:102 - Evicting cold readers for /Users/selimanolis/git/cassandra/data/data/pillar/cluster-1d8e08d0659c11e689b71db7b20aab7e/lb-2-big-Data.db TRACE [SharedPool-Worker-2] 2016-08-18 18:37:13,485 FileCacheService.java:151 - Estimated memory usage is 393216 compared to actual usage 0 TRACE [SharedPool-Worker-2] 2016-08-18 18:37:13,485 FileCacheService.java:151 - Estimated memory usage is 458752 compared to actual usage 65536 TRACE [SharedPool-Worker-1] 2016-08-18 18:37:13,485 StorageProxy.java:1449 - Read: 1 ms. DEBUG [SharedPool-Worker-1] 2016-08-18 18:37:13,485 AbstractQueryPager.java:95 - Fetched 3 live rows DEBUG [SharedPool-Worker-1] 2016-08-18 18:37:13,485 AbstractQueryPager.java:133 - Remaining rows to page: 2147483643 TRACE [SharedPool-Worker-1] 2016-08-18 18:37:13,485 Message.java:525 - Responding: ROWS [id(pillar, cluster), org.apache.cassandra.db.marshal.UTF8Type][rank(pillar, cluster), org.apache.cassandra.db.marshal.ReversedType(org.apache.cassandra.db.marshal.UTF8Type)][val(pillar, cluster), org.apache.cassandra.db.marshal.UTF8Type] (to be continued) | 1 | 5 | asdas | 1 | 4 | asdas ---, v=3 {noformat} Something must be blocking the client's second page request on the connection. Is it my configuration, a bug in the driver, or something in netty?

    DataStax JIRA | 4 months ago | Sotirios Delimanolis
    com.datastax.driver.core.exceptions.OperationTimedOutException: [localhost/127.0.0.1:9042] Timed out waiting for server response

    Root Cause Analysis

    1. com.datastax.driver.core.exceptions.OperationTimedOutException

      [localhost/127.0.0.1:9042] Timed out waiting for server response

      at com.datastax.driver.core.RequestHandler$SpeculativeExecution.onTimeout()
    2. DataStax Java Driver for Apache Cassandra - Core
      Connection$ResponseHandler$1.run
      1. com.datastax.driver.core.RequestHandler$SpeculativeExecution.onTimeout(RequestHandler.java:766)
      2. com.datastax.driver.core.Connection$ResponseHandler$1.run(Connection.java:1267)
      2 frames
    3. Netty
      HashedWheelTimer$Worker.run
      1. io.netty.util.HashedWheelTimer$HashedWheelTimeout.expire(HashedWheelTimer.java:581)
      2. io.netty.util.HashedWheelTimer$HashedWheelBucket.expireTimeouts(HashedWheelTimer.java:655)
      3. io.netty.util.HashedWheelTimer$Worker.run(HashedWheelTimer.java:367)
      3 frames
    4. Java RT
      Thread.run
      1. java.lang.Thread.run(Thread.java:745)
      1 frame