org.apache.hadoop.hbase.client.RetriesExhaustedException: Can't get the location

Apache's JIRA Issue Tracker | Josh Elser | 5 months ago
  1. 0

    Saw an error in some $dayjob testing where, while a RegionServer was going down to due to an exception, there was a scary looking exception about being unable to write to the stats table because an hconnection was closed. Pardon the mis-matched line numbers: {noformat} 2016-07-17 07:52:13,229 ERROR [phoenix-update-statistics-0] stats.StatisticsScanner: Failed to update statistics table! org.apache.hadoop.hbase.client.RetriesExhaustedException: Can't get the location at org.apache.hadoop.hbase.client.RpcRetryingCallerWithReadReplicas.getRegionLocations(RpcRetryingCallerWithReadReplicas.java:309) at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:152) at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:60) at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(RpcRetryingCaller.java:200) at org.apache.hadoop.hbase.client.ClientScanner.call(ClientScanner.java:326) at org.apache.hadoop.hbase.client.ClientScanner.nextScanner(ClientScanner.java:301) at org.apache.hadoop.hbase.client.ClientScanner.initializeScannerInConstruction(ClientScanner.java:166) at org.apache.hadoop.hbase.client.ClientScanner.<init>(ClientScanner.java:161) at org.apache.hadoop.hbase.client.HTable.getScanner(HTable.java:794) at org.apache.hadoop.hbase.client.HTableWrapper.getScanner(HTableWrapper.java:215) at org.apache.phoenix.schema.stats.StatisticsUtil.readStatistics(StatisticsUtil.java:136) at org.apache.phoenix.schema.stats.StatisticsWriter.deleteStats(StatisticsWriter.java:230) at org.apache.phoenix.schema.stats.StatisticsScanner$StatisticsScannerCallable.call(StatisticsScanner.java:117) at org.apache.phoenix.schema.stats.StatisticsScanner$StatisticsScannerCallable.call(StatisticsScanner.java:102) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: java.io.IOException: hconnection-0x5314972b closed at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegion(ConnectionManager.java:1153) at org.apache.hadoop.hbase.client.CoprocessorHConnection.locateRegion(CoprocessorHConnection.java:41) at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.relocateRegion(ConnectionManager.java:1133) at org.apache.hadoop.hbase.client.CoprocessorHConnection.relocateRegion(CoprocessorHConnection.java:41) at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegionInMeta(ConnectionManager.java:1338) at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegion(ConnectionManager.java:1162) at org.apache.hadoop.hbase.client.CoprocessorHConnection.locateRegion(CoprocessorHConnection.java:41) at org.apache.hadoop.hbase.client.RpcRetryingCallerWithReadReplicas.getRegionLocations(RpcRetryingCallerWithReadReplicas.java:300) ... 17 more {noformat} Looking into this some more, this async task to update the stats was still running after a RegionServer already was in the process of shutting down. The RegionServer already closed all of the "userRegions", but, because this task is async, the task is still running and using the RegionServer's CoprocessorHConnection. So, the RegionServer thinks all of the user regions are closed and it is safe to close the HConnection. In reality, there is still code tied to those user regions that might be running (as we can see with the above stacktrace). The next time the StatisticsScannerCallable tries to use the HConnection, it will then error. I think the simple fix is to just use the CoprocessorEnvironment to access the RegionServerServices and use the {{isClosing()}} and {{isClosed()}} methods. This is all pretty minor because the RegionServer is already shutting down, but it is likely misleading to less-experienced users who would think that the last exception in the log is the problem. Will put up a patch shortly.

    Apache's JIRA Issue Tracker | 5 months ago | Josh Elser
    org.apache.hadoop.hbase.client.RetriesExhaustedException: Can't get the location
  2. 0

    An internal python client has been getting below stack trace since HBASE-134347 {code} 2015-09-30 11:27:31,670 runner ERROR : scheduler executor error 2015-09-30 11:27:31,674 runner ERROR : Traceback (most recent call last): File "/opt/cops/cops-related-ticket-info-fetcher/fetcher/.virtenv/lib/python2.6/site-packages/CopsRtiFetcher-0.1-py2.6.egg/cops_rti/fetcher/runner.py", line 82, in run fetch_list = self.__scheduler_executor.run() File "/opt/cops/cops-related-ticket-info-fetcher/fetcher/.virtenv/lib/python2.6/site-packages/CopsRtiFetcher-0.1-py2.6.egg/cops_rti/fetcher/scheduler.py", line 35, in run with self.__fetch_db_dao.get_scanner() as scanner: File "/opt/cops/cops-related-ticket-info-fetcher/fetcher/.virtenv/lib/python2.6/site-packages/CopsHbaseCommon-f796bf2929be11c26536c3e8f3e9c0b0ecb382b3-py2.6.egg/cops/hbase/common/hbase_dao.py", line 57, in get_scanner caching=caching, field_filter_list=field_filter_list) File "/opt/cops/cops-related-ticket-info-fetcher/fetcher/.virtenv/lib/python2.6/site-packages/CopsHbaseCommon-f796bf2929be11c26536c3e8f3e9c0b0ecb382b3-py2.6.egg/cops/hbase/common/hbase_client_template.py", line 104, in get_entity_scanner self.__fix_cfs(self.__filter_columns(field_filter_list)), caching) File "/opt/cops/cops-related-ticket-info-fetcher/fetcher/.virtenv/lib/python2.6/site-packages/CopsHbaseCommon-f796bf2929be11c26536c3e8f3e9c0b0ecb382b3-py2.6.egg/cops/hbase/common/hbase_entity_scanner.py", line 81, in open self.__scanner_id = client.scannerOpenWithScan(table_name, scan) File "/opt/cops/cops-related-ticket-info-fetcher/.crepo/cops-hbase-common/ext-py/hbase/Hbase.py", line 1494, in scannerOpenWithScan return self.recv_scannerOpenWithScan() File "/opt/cops/cops-related-ticket-info-fetcher/.crepo/cops-hbase-common/ext-py/hbase/Hbase.py", line 1518, in recv_scannerOpenWithScan raise result.io IOError: IOError(message="org.apache.hadoop.hbase.client.RetriesExhaustedException: Can't get the location\n\tat org.apache.hadoop.hbase.client.RpcRetryingCallerWithReadReplicas.getRegionLocations(RpcRetryingCallerWithReadReplicas.java:308)\n\tat org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:149)\n\tat org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:57)\n\tat org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(RpcRetryingCaller.java:200)\n\tat org.apache.hadoop.hbase.client.ClientScanner.call(ClientScanner.java:293)\n\tat org.apache.hadoop.hbase.client.ClientScanner.nextScanner(ClientScanner.java:268)\n\tat org.apache.hadoop.hbase.client.ClientScanner.initializeScannerInConstruction(ClientScanner.java:140)\n\tat org.apache.hadoop.hbase.client.ClientScanner.<init>(ClientScanner.java:135)\n\tat org.apache.hadoop.hbase.client.HTable.getScanner(HTable.java:888)\n\tat org.apache.hadoop.hbase.thrift.ThriftServerRunner$HBaseHandler.scannerOpenWithScan(ThriftServerRunner.java:1446)\n\tat sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source)\n\tat sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)\n\tat java.lang.reflect.Method.invoke(Method.java:606)\n\tat org.apache.hadoop.hbase.thrift.HbaseHandlerMetricsProxy.invoke(HbaseHandlerMetricsProxy.java:67)\n\tat com.sun.proxy.$Proxy14.scannerOpenWithScan(Unknown Source)\n\tat org.apache.hadoop.hbase.thrift.generated.Hbase$Processor$scannerOpenWithScan.getResult(Hbase.java:4609)\n\tat org.apache.hadoop.hbase.thrift.generated.Hbase$Processor$scannerOpenWithScan.getResult(Hbase.java:4593)\n\tat org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)\n\tat org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)\n\tat org.apache.hadoop.hbase.thrift.ThriftServerRunner$3.process(ThriftServerRunner.java:502)\n\tat org.apache.hadoop.hbase.thrift.TBoundedThreadPoolServer$ClientConnnection.run(TBoundedThreadPoolServer.java:289)\n\tat java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)\n\tat java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)\n\tat java.lang.Thread.run(Thread.java:745)\nCaused by: java.io.IOException: hconnection-0xa8e1bf9 closed\n\tat org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegion(ConnectionManager.java:1117)\n\tat org.apache.hadoop.hbase.client.RpcRetryingCallerWithReadReplicas.getRegionLocations(RpcRetryingCallerWithReadReplicas.java:299)\n\t... 23 more\n") {code} On the thrift server side we see this: {code} 2015-09-30 07:22:59,427 ERROR org.apache.hadoop.hbase.client.AsyncProcess: Failed to get region location java.io.IOException: hconnection-0x4142991e closed at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegion(ConnectionManager.java:1117) at org.apache.hadoop.hbase.client.AsyncProcess.submit(AsyncProcess.java:369) at org.apache.hadoop.hbase.client.AsyncProcess.submit(AsyncProcess.java:320) at org.apache.hadoop.hbase.client.BufferedMutatorImpl.backgroundFlushCommits(BufferedMutatorImpl.java:206) at org.apache.hadoop.hbase.client.BufferedMutatorImpl.flush(BufferedMutatorImpl.java:183) at org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:1496) at org.apache.hadoop.hbase.client.HTable.put(HTable.java:1107) at org.apache.hadoop.hbase.thrift.ThriftServerRunner$HBaseHandler.mutateRowTs(ThriftServerRunner.java:1256) at org.apache.hadoop.hbase.thrift.ThriftServerRunner$HBaseHandler.mutateRow(ThriftServerRunner.java:1209) at sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.hbase.thrift.HbaseHandlerMetricsProxy.invoke(HbaseHandlerMetricsProxy.java:67) at com.sun.proxy.$Proxy14.mutateRow(Unknown Source) at org.apache.hadoop.hbase.thrift.generated.Hbase$Processor$mutateRow.getResult(Hbase.java:4334) at org.apache.hadoop.hbase.thrift.generated.Hbase$Processor$mutateRow.getResult(Hbase.java:4318) at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) at org.apache.hadoop.hbase.thrift.ThriftServerRunner$3.process(ThriftServerRunner.java:502) at org.apache.hadoop.hbase.thrift.TBoundedThreadPoolServer$ClientConnnection.run(TBoundedThreadPoolServer.java:289) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) {code} HBASE-13437 has us actual execute a close on timeout -- before we'd mark connection closed but would never call close on it. A background chore is going around stamping Connections in the ConnectionCache as 'closed' if they have not been used in ten minutes. The 'close' can come in at any time..... In particular between the point at which we get the table/connection and when we go to use it: i.e. flush puts. It is at the flush puts point that we get the above 'AsyncProcess: Failed to get region location' (It is not a failure to find region location but rather our noticing that the connection has been closed). Attempts at reproducing this issue locally letting the Connection timeout can generate the above exception if a certain dance is done but it is hard to do; I am not reproducing the actual usage by the aforementioned client. Next steps would be setting up python client talking via thrift and then try using connection after it has been evicted from the connection cache. Another thing to try is a pool of connections on the python side...connections are identified by user and table.

    Apache's JIRA Issue Tracker | 1 year ago | stack
    java.io.IOException: hconnection-0xa8e1bf9 closed
  3. 0

    [HBASE-14533] Thrift client gets "AsyncProcess: Failed to get region location .... closed" - ASF JIRA

    apache.org | 12 months ago
    java.io.IOException: hconnection-0x17f35451 closed
  4. Speed up your debug routine!

    Automated exception search integrated into your IDE

  5. 0

    Thrift Server is crashing due to "RetriesExhaustedException"

    Stack Overflow | 1 year ago | freebourn
    org.apache.hadoop.hbase.client.RetriesExhaustedException: Can't get the location
  6. 0

    [HBASE-14533] Thrift client gets "AsyncProcess: Failed to get region location .... closed" - ASF JIRA

    apache.org | 12 months ago
    java.io.IOException: hconnection-0xa8e1bf9 closed

    Not finding the right solution?
    Take a tour to get the most out of Samebug.

    Tired of useless tips?

    Automated exception search integrated into your IDE

    Root Cause Analysis

    1. java.io.IOException

      hconnection-0x5314972b closed

      at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegion()
    2. HBase - Client
      HTableWrapper.getScanner
      1. org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegion(ConnectionManager.java:1153)
      2. org.apache.hadoop.hbase.client.CoprocessorHConnection.locateRegion(CoprocessorHConnection.java:41)
      3. org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.relocateRegion(ConnectionManager.java:1133)
      4. org.apache.hadoop.hbase.client.CoprocessorHConnection.relocateRegion(CoprocessorHConnection.java:41)
      5. org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegionInMeta(ConnectionManager.java:1338)
      6. org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegion(ConnectionManager.java:1162)
      7. org.apache.hadoop.hbase.client.CoprocessorHConnection.locateRegion(CoprocessorHConnection.java:41)
      8. org.apache.hadoop.hbase.client.RpcRetryingCallerWithReadReplicas.getRegionLocations(RpcRetryingCallerWithReadReplicas.java:300)
      9. org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:152)
      10. org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:60)
      11. org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(RpcRetryingCaller.java:200)
      12. org.apache.hadoop.hbase.client.ClientScanner.call(ClientScanner.java:326)
      13. org.apache.hadoop.hbase.client.ClientScanner.nextScanner(ClientScanner.java:301)
      14. org.apache.hadoop.hbase.client.ClientScanner.initializeScannerInConstruction(ClientScanner.java:166)
      15. org.apache.hadoop.hbase.client.ClientScanner.<init>(ClientScanner.java:161)
      16. org.apache.hadoop.hbase.client.HTable.getScanner(HTable.java:794)
      17. org.apache.hadoop.hbase.client.HTableWrapper.getScanner(HTableWrapper.java:215)
      17 frames
    3. Phoenix Core
      StatisticsScanner$StatisticsScannerCallable.call
      1. org.apache.phoenix.schema.stats.StatisticsUtil.readStatistics(StatisticsUtil.java:136)
      2. org.apache.phoenix.schema.stats.StatisticsWriter.deleteStats(StatisticsWriter.java:230)
      3. org.apache.phoenix.schema.stats.StatisticsScanner$StatisticsScannerCallable.call(StatisticsScanner.java:117)
      4. org.apache.phoenix.schema.stats.StatisticsScanner$StatisticsScannerCallable.call(StatisticsScanner.java:102)
      4 frames
    4. Java RT
      Thread.run
      1. java.util.concurrent.FutureTask.run(FutureTask.java:266)
      2. java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
      3. java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
      4. java.lang.Thread.run(Thread.java:745)
      4 frames