org.apache.spark.SparkException

There are no available Samebug tips for this exception. Do you have an idea how to solve this issue? A short tip would help users who saw this issue last week.

  • While testing spark jobs on VM we noticed that the spark job logs a lot of heartbeat retries messages in master log. Here is the stacktrace: Spark program ran fine though. {code} 2016-04-29 05:04:05,963 - WARN [driver-heartbeater:o.a.s.Logging$class@91] - Error sending message [message = Heartbeat(driver,[L scala.Tuple2;@6d61f2ad,BlockManagerId(driver, localhost, 49484))] in 3 attempts org.apache.spark.SparkException: Could not find HeartbeatReceiver or it has been stopped. at org.apache.spark.rpc.netty.Dispatcher.postMessage(Dispatcher.scala:161) ~[co.cask.cdap.spark-assembly-1.6.1.jar:na] at org.apache.spark.rpc.netty.Dispatcher.postLocalMessage(Dispatcher.scala:126) ~[co.cask.cdap.spark-assembly-1.6.1.jar:na ] at org.apache.spark.rpc.netty.NettyRpcEnv.ask(NettyRpcEnv.scala:227) ~[co.cask.cdap.spark-assembly-1.6.1.jar:na] at org.apache.spark.rpc.netty.NettyRpcEndpointRef.ask(NettyRpcEnv.scala:511) ~[co.cask.cdap.spark-assembly-1.6.1.jar:na] at org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:100) ~[co.cask.cdap.spark-assembly-1.6.1.jar:na] at org.apache.spark.executor.Executor.org$apache$spark$executor$Executor$$reportHeartBeat(Executor.scala:449) [co.cask.cda p.spark-assembly-1.6.1.jar:na] at org.apache.spark.executor.Executor$$anon$1$$anonfun$run$1.apply$mcV$sp(Executor.scala:470) [co.cask.cdap.spark-assembly -1.6.1.jar:na] at org.apache.spark.executor.Executor$$anon$1$$anonfun$run$1.apply(Executor.scala:470) [co.cask.cdap.spark-assembly-1.6.1. jar:na] at org.apache.spark.executor.Executor$$anon$1$$anonfun$run$1.apply(Executor.scala:470) [co.cask.cdap.spark-assembly-1.6.1. jar:na] at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1765) [co.cask.cdap.spark-assembly-1.6.1.jar:na] at org.apache.spark.executor.Executor$$anon$1.run(Executor.scala:470) [co.cask.cdap.spark-assembly-1.6.1.jar:na] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) [na:1.7.0_75] at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304) [na:1.7.0_75] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178) [ na:1.7.0_75] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) [na:1.7. 0_75] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [na:1.7.0_75] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [na:1.7.0_75] Caused by: org.apache.spark.SparkException: Could not find HeartbeatReceiver or it has been stopped. at org.apache.spark.rpc.netty.Dispatcher.postMessage(Dispatcher.scala:161) ~[co.cask.cdap.spark-assembly-1.6.1.jar:na] at org.apache.spark.rpc.netty.Dispatcher.postLocalMessage(Dispatcher.scala:126) ~[co.cask.cdap.spark-assembly-1.6.1.jar:na] at org.apache.spark.rpc.netty.NettyRpcEnv.ask(NettyRpcEnv.scala:227) ~[co.cask.cdap.spark-assembly-1.6.1.jar:na] at org.apache.spark.rpc.netty.NettyRpcEndpointRef.ask(NettyRpcEnv.scala:511) ~[co.cask.cdap.spark-assembly-1.6.1.jar:na] at org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:100) ~[co.cask.cdap.spark-assembly-1.6.1.jar:na] ... 13 common frames omitted {code}
    via by Rohit Sinha,
  • Apache Spark User List - GC overhead limit exceeded
    via by Unknown author,
  • No title
    via by lei ju,
  • Sparkling water executor error
    via by hart jo,
  • Spark applications running on Mesos throw exception upon exit as follows: {noformat} 16/07/13 15:20:46 WARN NettyRpcEndpointRef: Error sending message [message = RemoveExecutor(1,Executor finished with state FINISHED)] in 3 attempts org.apache.spark.SparkException: Exception thrown in awaitResult at org.apache.spark.rpc.RpcTimeout$$anonfun$1.applyOrElse(RpcTimeout.scala:77) at org.apache.spark.rpc.RpcTimeout$$anonfun$1.applyOrElse(RpcTimeout.scala:75) at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36) at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59) at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59) at scala.PartialFunction$OrElse.apply(PartialFunction.scala:167) at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:83) at org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:102) at org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:78) at org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.removeExecutor(CoarseGrainedSchedulerBackend.scala:412) at org.apache.spark.scheduler.cluster.mesos.MesosCoarseGrainedSchedulerBackend.executorTerminated(MesosCoarseGrainedSchedulerBackend.scala:555) at org.apache.spark.scheduler.cluster.mesos.MesosCoarseGrainedSchedulerBackend.statusUpdate(MesosCoarseGrainedSchedulerBackend.scala:495) Caused by: org.apache.spark.SparkException: Could not find CoarseGrainedScheduler. at org.apache.spark.rpc.netty.Dispatcher.postMessage(Dispatcher.scala:152) at org.apache.spark.rpc.netty.Dispatcher.postLocalMessage(Dispatcher.scala:127) at org.apache.spark.rpc.netty.NettyRpcEnv.ask(NettyRpcEnv.scala:225) at org.apache.spark.rpc.netty.NettyRpcEndpointRef.ask(NettyRpcEnv.scala:508) at org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:101) ... 4 more Exception in thread "Thread-47" org.apache.spark.SparkException: Error notifying standalone scheduler's driver endpoint at org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.removeExecutor(CoarseGrainedSchedulerBackend.scala:415) at org.apache.spark.scheduler.cluster.mesos.MesosCoarseGrainedSchedulerBackend.executorTerminated(MesosCoarseGrainedSchedulerBackend.scala:555) at org.apache.spark.scheduler.cluster.mesos.MesosCoarseGrainedSchedulerBackend.statusUpdate(MesosCoarseGrainedSchedulerBackend.scala:495) Caused by: org.apache.spark.SparkException: Error sending message [message = RemoveExecutor(1,Executor finished with state FINISHED)] at org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:119) at org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:78) at org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.removeExecutor(CoarseGrainedSchedulerBackend.scala:412) ... 2 more Caused by: org.apache.spark.SparkException: Exception thrown in awaitResult at org.apache.spark.rpc.RpcTimeout$$anonfun$1.applyOrElse(RpcTimeout.scala:77) at org.apache.spark.rpc.RpcTimeout$$anonfun$1.applyOrElse(RpcTimeout.scala:75) at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36) at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59) at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59) at scala.PartialFunction$OrElse.apply(PartialFunction.scala:167) at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:83) at org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:102) ... 4 more Caused by: org.apache.spark.SparkException: Could not find CoarseGrainedScheduler. at org.apache.spark.rpc.netty.Dispatcher.postMessage(Dispatcher.scala:152) at org.apache.spark.rpc.netty.Dispatcher.postLocalMessage(Dispatcher.scala:127) at org.apache.spark.rpc.netty.NettyRpcEnv.ask(NettyRpcEnv.scala:225) at org.apache.spark.rpc.netty.NettyRpcEndpointRef.ask(NettyRpcEnv.scala:508) at org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:101) ... 4 more {noformat} Applications' result is not affected by this error. This issue can be simply reproduced by launching a spark-shell, and exit after running the following commands: {code} val rdd = sc.parallelize(1 to 10, 10) rdd.map { _ + 1} collect {code} The root cause is that in SparkContext.stop(), MesosCoarseGrainedSchedulerBackend.stop() calls CoarseGrainedSchedulerBackend.stop(). The latter sends messages to stop executors and also stop the driver endpoint without waiting for the actual stop of executors. MesosCoarseGrainedSchedulerBackend.stop() still waits for the executors to stop in a timeout. During the wait, MesosCoarseGrainedSchedulerBackend.statusUpdate() generally will be called to update executors' status, and in turn removeExecutor() is called. But at that time, the driver endpoint is not available.
    via by Sun Rui,
  • GitHub comment 572#249369489
    via GitHub by car2008
    ,
  • GitHub comment 72#253767564
    via GitHub by mooperd
    ,
    • org.apache.spark.SparkException: Exception thrown in awaitResult at org.apache.spark.rpc.RpcTimeout$$anonfun$1.applyOrElse(RpcTimeout.scala:77) at org.apache.spark.rpc.RpcTimeout$$anonfun$1.applyOrElse(RpcTimeout.scala:75) at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36) at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59) at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59) at scala.PartialFunction$OrElse.apply(PartialFunction.scala:167) at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:83) at org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:102) at org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:78) at org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.removeExecutor(CoarseGrainedSchedulerBackend.scala:414) at org.apache.spark.scheduler.cluster.mesos.CoarseMesosSchedulerBackend.executorTerminated(CoarseMesosSchedulerBackend.scala:553) at org.apache.spark.scheduler.cluster.mesos.CoarseMesosSchedulerBackend.statusUpdate(CoarseMesosSchedulerBackend.scala:494) Caused by: org.apache.spark.SparkException: Could not find CoarseGrainedScheduler or it has been stopped. at org.apache.spark.rpc.netty.Dispatcher.postMessage(Dispatcher.scala:162) at org.apache.spark.rpc.netty.Dispatcher.postLocalMessage(Dispatcher.scala:127) at org.apache.spark.rpc.netty.NettyRpcEnv.ask(NettyRpcEnv.scala:225) at org.apache.spark.rpc.netty.NettyRpcEndpointRef.ask(NettyRpcEnv.scala:508) at org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:101) ... 4 more

    Users with the same issue

    Nikolay Rybak
    3 times, last one,
    tyson925
    5 times, last one,
    Unknown visitor1 times, last one,
    Unknown visitor2 times, last one,
    Unknown visitor1 times, last one,
    1 more bugmates