org.apache.spark.SparkException: Exception thrown in awaitResult

There are no available Samebug tips for this exception. Do you have an idea how to solve this issue? A short tip would help users who saw this issue last week.

  • Spark applications running on Mesos throw exception upon exit as follows: {noformat} 16/07/13 15:20:46 WARN NettyRpcEndpointRef: Error sending message [message = RemoveExecutor(1,Executor finished with state FINISHED)] in 3 attempts org.apache.spark.SparkException: Exception thrown in awaitResult at org.apache.spark.rpc.RpcTimeout$$anonfun$1.applyOrElse(RpcTimeout.scala:77) at org.apache.spark.rpc.RpcTimeout$$anonfun$1.applyOrElse(RpcTimeout.scala:75) at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36) at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59) at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59) at scala.PartialFunction$OrElse.apply(PartialFunction.scala:167) at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:83) at org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:102) at org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:78) at org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.removeExecutor(CoarseGrainedSchedulerBackend.scala:412) at org.apache.spark.scheduler.cluster.mesos.MesosCoarseGrainedSchedulerBackend.executorTerminated(MesosCoarseGrainedSchedulerBackend.scala:555) at org.apache.spark.scheduler.cluster.mesos.MesosCoarseGrainedSchedulerBackend.statusUpdate(MesosCoarseGrainedSchedulerBackend.scala:495) Caused by: org.apache.spark.SparkException: Could not find CoarseGrainedScheduler. at org.apache.spark.rpc.netty.Dispatcher.postMessage(Dispatcher.scala:152) at org.apache.spark.rpc.netty.Dispatcher.postLocalMessage(Dispatcher.scala:127) at org.apache.spark.rpc.netty.NettyRpcEnv.ask(NettyRpcEnv.scala:225) at org.apache.spark.rpc.netty.NettyRpcEndpointRef.ask(NettyRpcEnv.scala:508) at org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:101) ... 4 more Exception in thread "Thread-47" org.apache.spark.SparkException: Error notifying standalone scheduler's driver endpoint at org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.removeExecutor(CoarseGrainedSchedulerBackend.scala:415) at org.apache.spark.scheduler.cluster.mesos.MesosCoarseGrainedSchedulerBackend.executorTerminated(MesosCoarseGrainedSchedulerBackend.scala:555) at org.apache.spark.scheduler.cluster.mesos.MesosCoarseGrainedSchedulerBackend.statusUpdate(MesosCoarseGrainedSchedulerBackend.scala:495) Caused by: org.apache.spark.SparkException: Error sending message [message = RemoveExecutor(1,Executor finished with state FINISHED)] at org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:119) at org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:78) at org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.removeExecutor(CoarseGrainedSchedulerBackend.scala:412) ... 2 more Caused by: org.apache.spark.SparkException: Exception thrown in awaitResult at org.apache.spark.rpc.RpcTimeout$$anonfun$1.applyOrElse(RpcTimeout.scala:77) at org.apache.spark.rpc.RpcTimeout$$anonfun$1.applyOrElse(RpcTimeout.scala:75) at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36) at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59) at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59) at scala.PartialFunction$OrElse.apply(PartialFunction.scala:167) at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:83) at org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:102) ... 4 more Caused by: org.apache.spark.SparkException: Could not find CoarseGrainedScheduler. at org.apache.spark.rpc.netty.Dispatcher.postMessage(Dispatcher.scala:152) at org.apache.spark.rpc.netty.Dispatcher.postLocalMessage(Dispatcher.scala:127) at org.apache.spark.rpc.netty.NettyRpcEnv.ask(NettyRpcEnv.scala:225) at org.apache.spark.rpc.netty.NettyRpcEndpointRef.ask(NettyRpcEnv.scala:508) at org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:101) ... 4 more {noformat} Applications' result is not affected by this error. This issue can be simply reproduced by launching a spark-shell, and exit after running the following commands: {code} val rdd = sc.parallelize(1 to 10, 10) rdd.map { _ + 1} collect {code} The root cause is that in SparkContext.stop(), MesosCoarseGrainedSchedulerBackend.stop() calls CoarseGrainedSchedulerBackend.stop(). The latter sends messages to stop executors and also stop the driver endpoint without waiting for the actual stop of executors. MesosCoarseGrainedSchedulerBackend.stop() still waits for the executors to stop in a timeout. During the wait, MesosCoarseGrainedSchedulerBackend.statusUpdate() generally will be called to update executors' status, and in turn removeExecutor() is called. But at that time, the driver endpoint is not available.
    via by Sun Rui,
  • GitHub comment 72#253767564
    via GitHub by mooperd
    ,
  • While testing spark jobs on VM we noticed that the spark job logs a lot of heartbeat retries messages in master log. Here is the stacktrace: Spark program ran fine though. {code} 2016-04-29 05:04:05,963 - WARN [driver-heartbeater:o.a.s.Logging$class@91] - Error sending message [message = Heartbeat(driver,[L scala.Tuple2;@6d61f2ad,BlockManagerId(driver, localhost, 49484))] in 3 attempts org.apache.spark.SparkException: Could not find HeartbeatReceiver or it has been stopped. at org.apache.spark.rpc.netty.Dispatcher.postMessage(Dispatcher.scala:161) ~[co.cask.cdap.spark-assembly-1.6.1.jar:na] at org.apache.spark.rpc.netty.Dispatcher.postLocalMessage(Dispatcher.scala:126) ~[co.cask.cdap.spark-assembly-1.6.1.jar:na ] at org.apache.spark.rpc.netty.NettyRpcEnv.ask(NettyRpcEnv.scala:227) ~[co.cask.cdap.spark-assembly-1.6.1.jar:na] at org.apache.spark.rpc.netty.NettyRpcEndpointRef.ask(NettyRpcEnv.scala:511) ~[co.cask.cdap.spark-assembly-1.6.1.jar:na] at org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:100) ~[co.cask.cdap.spark-assembly-1.6.1.jar:na] at org.apache.spark.executor.Executor.org$apache$spark$executor$Executor$$reportHeartBeat(Executor.scala:449) [co.cask.cda p.spark-assembly-1.6.1.jar:na] at org.apache.spark.executor.Executor$$anon$1$$anonfun$run$1.apply$mcV$sp(Executor.scala:470) [co.cask.cdap.spark-assembly -1.6.1.jar:na] at org.apache.spark.executor.Executor$$anon$1$$anonfun$run$1.apply(Executor.scala:470) [co.cask.cdap.spark-assembly-1.6.1. jar:na] at org.apache.spark.executor.Executor$$anon$1$$anonfun$run$1.apply(Executor.scala:470) [co.cask.cdap.spark-assembly-1.6.1. jar:na] at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1765) [co.cask.cdap.spark-assembly-1.6.1.jar:na] at org.apache.spark.executor.Executor$$anon$1.run(Executor.scala:470) [co.cask.cdap.spark-assembly-1.6.1.jar:na] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) [na:1.7.0_75] at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304) [na:1.7.0_75] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178) [ na:1.7.0_75] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) [na:1.7. 0_75] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [na:1.7.0_75] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [na:1.7.0_75] Caused by: org.apache.spark.SparkException: Could not find HeartbeatReceiver or it has been stopped. at org.apache.spark.rpc.netty.Dispatcher.postMessage(Dispatcher.scala:161) ~[co.cask.cdap.spark-assembly-1.6.1.jar:na] at org.apache.spark.rpc.netty.Dispatcher.postLocalMessage(Dispatcher.scala:126) ~[co.cask.cdap.spark-assembly-1.6.1.jar:na] at org.apache.spark.rpc.netty.NettyRpcEnv.ask(NettyRpcEnv.scala:227) ~[co.cask.cdap.spark-assembly-1.6.1.jar:na] at org.apache.spark.rpc.netty.NettyRpcEndpointRef.ask(NettyRpcEnv.scala:511) ~[co.cask.cdap.spark-assembly-1.6.1.jar:na] at org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:100) ~[co.cask.cdap.spark-assembly-1.6.1.jar:na] ... 13 common frames omitted {code}
    via by Rohit Sinha,
  • Apache Spark User List - GC overhead limit exceeded
    via by Unknown author,
  • No title
    via by lei ju,
  • Sparkling water executor error
    via by hart jo,
    • org.apache.spark.SparkException: Exception thrown in awaitResult at org.apache.spark.rpc.RpcTimeout$$anonfun$1.applyOrElse(RpcTimeout.scala:77) at org.apache.spark.rpc.RpcTimeout$$anonfun$1.applyOrElse(RpcTimeout.scala:75) at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36) at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59) at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59) at scala.PartialFunction$OrElse.apply(PartialFunction.scala:167) at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:83) at org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:102) at org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:78) at org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.removeExecutor(CoarseGrainedSchedulerBackend.scala:412) at org.apache.spark.scheduler.cluster.mesos.MesosCoarseGrainedSchedulerBackend.executorTerminated(MesosCoarseGrainedSchedulerBackend.scala:555) at org.apache.spark.scheduler.cluster.mesos.MesosCoarseGrainedSchedulerBackend.statusUpdate(MesosCoarseGrainedSchedulerBackend.scala:495) Caused by: org.apache.spark.SparkException: Could not find CoarseGrainedScheduler. at org.apache.spark.rpc.netty.Dispatcher.postMessage(Dispatcher.scala:152) at org.apache.spark.rpc.netty.Dispatcher.postLocalMessage(Dispatcher.scala:127) at org.apache.spark.rpc.netty.NettyRpcEnv.ask(NettyRpcEnv.scala:225) at org.apache.spark.rpc.netty.NettyRpcEndpointRef.ask(NettyRpcEnv.scala:508) at org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:101) ... 4 more

    Users with the same issue

    Nikolay Rybak
    Nikolay Rybak3 times, last one,
    tyson925
    tyson9255 times, last one,
    Unknown visitor1 times, last one,
    Unknown visitor2 times, last one,
    Unknown visitor1 times, last one,
    1 more bugmates