org.apache.spark.SparkException: Error notifying standalone scheduler's driver endpoint

GitHub | mooperd | 2 months ago
  1. 0

    GitHub comment 72#253767564

    GitHub | 2 months ago | mooperd
    org.apache.spark.SparkException: Error notifying standalone scheduler's driver endpoint
  2. 0

    Spark applications running on Mesos throw exception upon exit as follows: {noformat} 16/07/13 15:20:46 WARN NettyRpcEndpointRef: Error sending message [message = RemoveExecutor(1,Executor finished with state FINISHED)] in 3 attempts org.apache.spark.SparkException: Exception thrown in awaitResult at org.apache.spark.rpc.RpcTimeout$$anonfun$1.applyOrElse(RpcTimeout.scala:77) at org.apache.spark.rpc.RpcTimeout$$anonfun$1.applyOrElse(RpcTimeout.scala:75) at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36) at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59) at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59) at scala.PartialFunction$OrElse.apply(PartialFunction.scala:167) at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:83) at org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:102) at org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:78) at org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.removeExecutor(CoarseGrainedSchedulerBackend.scala:412) at org.apache.spark.scheduler.cluster.mesos.MesosCoarseGrainedSchedulerBackend.executorTerminated(MesosCoarseGrainedSchedulerBackend.scala:555) at org.apache.spark.scheduler.cluster.mesos.MesosCoarseGrainedSchedulerBackend.statusUpdate(MesosCoarseGrainedSchedulerBackend.scala:495) Caused by: org.apache.spark.SparkException: Could not find CoarseGrainedScheduler. at org.apache.spark.rpc.netty.Dispatcher.postMessage(Dispatcher.scala:152) at org.apache.spark.rpc.netty.Dispatcher.postLocalMessage(Dispatcher.scala:127) at org.apache.spark.rpc.netty.NettyRpcEnv.ask(NettyRpcEnv.scala:225) at org.apache.spark.rpc.netty.NettyRpcEndpointRef.ask(NettyRpcEnv.scala:508) at org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:101) ... 4 more Exception in thread "Thread-47" org.apache.spark.SparkException: Error notifying standalone scheduler's driver endpoint at org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.removeExecutor(CoarseGrainedSchedulerBackend.scala:415) at org.apache.spark.scheduler.cluster.mesos.MesosCoarseGrainedSchedulerBackend.executorTerminated(MesosCoarseGrainedSchedulerBackend.scala:555) at org.apache.spark.scheduler.cluster.mesos.MesosCoarseGrainedSchedulerBackend.statusUpdate(MesosCoarseGrainedSchedulerBackend.scala:495) Caused by: org.apache.spark.SparkException: Error sending message [message = RemoveExecutor(1,Executor finished with state FINISHED)] at org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:119) at org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:78) at org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.removeExecutor(CoarseGrainedSchedulerBackend.scala:412) ... 2 more Caused by: org.apache.spark.SparkException: Exception thrown in awaitResult at org.apache.spark.rpc.RpcTimeout$$anonfun$1.applyOrElse(RpcTimeout.scala:77) at org.apache.spark.rpc.RpcTimeout$$anonfun$1.applyOrElse(RpcTimeout.scala:75) at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36) at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59) at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59) at scala.PartialFunction$OrElse.apply(PartialFunction.scala:167) at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:83) at org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:102) ... 4 more Caused by: org.apache.spark.SparkException: Could not find CoarseGrainedScheduler. at org.apache.spark.rpc.netty.Dispatcher.postMessage(Dispatcher.scala:152) at org.apache.spark.rpc.netty.Dispatcher.postLocalMessage(Dispatcher.scala:127) at org.apache.spark.rpc.netty.NettyRpcEnv.ask(NettyRpcEnv.scala:225) at org.apache.spark.rpc.netty.NettyRpcEndpointRef.ask(NettyRpcEnv.scala:508) at org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:101) ... 4 more {noformat} Applications' result is not affected by this error. This issue can be simply reproduced by launching a spark-shell, and exit after running the following commands: {code} val rdd = sc.parallelize(1 to 10, 10) rdd.map { _ + 1} collect {code} The root cause is that in SparkContext.stop(), MesosCoarseGrainedSchedulerBackend.stop() calls CoarseGrainedSchedulerBackend.stop(). The latter sends messages to stop executors and also stop the driver endpoint without waiting for the actual stop of executors. MesosCoarseGrainedSchedulerBackend.stop() still waits for the executors to stop in a timeout. During the wait, MesosCoarseGrainedSchedulerBackend.statusUpdate() generally will be called to update executors' status, and in turn removeExecutor() is called. But at that time, the driver endpoint is not available.

    Apache's JIRA Issue Tracker | 5 months ago | Sun Rui
    org.apache.spark.SparkException: Error notifying standalone scheduler's driver endpoint
  3. 0

    [jira] [Commented] (SPARK-16522) [MESOS] Spark application throws exception on exit

    mail-archive.com | 1 month ago
    org.apache.spark.SparkException: Error notifying standalone scheduler's driver endpoint
  4. Speed up your debug routine!

    Automated exception search integrated into your IDE

  5. 0

    Warning when running spark example on mesos: Could not find CoarseGrainedScheduler

    Stack Overflow | 2 months ago | weicong
    org.apache.spark.SparkException: Exception thrown in awaitResult
  6. 0

    While testing spark jobs on VM we noticed that the spark job logs a lot of heartbeat retries messages in master log. Here is the stacktrace: Spark program ran fine though. {code} 2016-04-29 05:04:05,963 - WARN [driver-heartbeater:o.a.s.Logging$class@91] - Error sending message [message = Heartbeat(driver,[L scala.Tuple2;@6d61f2ad,BlockManagerId(driver, localhost, 49484))] in 3 attempts org.apache.spark.SparkException: Could not find HeartbeatReceiver or it has been stopped. at org.apache.spark.rpc.netty.Dispatcher.postMessage(Dispatcher.scala:161) ~[co.cask.cdap.spark-assembly-1.6.1.jar:na] at org.apache.spark.rpc.netty.Dispatcher.postLocalMessage(Dispatcher.scala:126) ~[co.cask.cdap.spark-assembly-1.6.1.jar:na ] at org.apache.spark.rpc.netty.NettyRpcEnv.ask(NettyRpcEnv.scala:227) ~[co.cask.cdap.spark-assembly-1.6.1.jar:na] at org.apache.spark.rpc.netty.NettyRpcEndpointRef.ask(NettyRpcEnv.scala:511) ~[co.cask.cdap.spark-assembly-1.6.1.jar:na] at org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:100) ~[co.cask.cdap.spark-assembly-1.6.1.jar:na] at org.apache.spark.executor.Executor.org$apache$spark$executor$Executor$$reportHeartBeat(Executor.scala:449) [co.cask.cda p.spark-assembly-1.6.1.jar:na] at org.apache.spark.executor.Executor$$anon$1$$anonfun$run$1.apply$mcV$sp(Executor.scala:470) [co.cask.cdap.spark-assembly -1.6.1.jar:na] at org.apache.spark.executor.Executor$$anon$1$$anonfun$run$1.apply(Executor.scala:470) [co.cask.cdap.spark-assembly-1.6.1. jar:na] at org.apache.spark.executor.Executor$$anon$1$$anonfun$run$1.apply(Executor.scala:470) [co.cask.cdap.spark-assembly-1.6.1. jar:na] at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1765) [co.cask.cdap.spark-assembly-1.6.1.jar:na] at org.apache.spark.executor.Executor$$anon$1.run(Executor.scala:470) [co.cask.cdap.spark-assembly-1.6.1.jar:na] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) [na:1.7.0_75] at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304) [na:1.7.0_75] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178) [ na:1.7.0_75] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) [na:1.7. 0_75] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [na:1.7.0_75] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [na:1.7.0_75] Caused by: org.apache.spark.SparkException: Could not find HeartbeatReceiver or it has been stopped. at org.apache.spark.rpc.netty.Dispatcher.postMessage(Dispatcher.scala:161) ~[co.cask.cdap.spark-assembly-1.6.1.jar:na] at org.apache.spark.rpc.netty.Dispatcher.postLocalMessage(Dispatcher.scala:126) ~[co.cask.cdap.spark-assembly-1.6.1.jar:na] at org.apache.spark.rpc.netty.NettyRpcEnv.ask(NettyRpcEnv.scala:227) ~[co.cask.cdap.spark-assembly-1.6.1.jar:na] at org.apache.spark.rpc.netty.NettyRpcEndpointRef.ask(NettyRpcEnv.scala:511) ~[co.cask.cdap.spark-assembly-1.6.1.jar:na] at org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:100) ~[co.cask.cdap.spark-assembly-1.6.1.jar:na] ... 13 common frames omitted {code}

    Cask Community Issue Tracker | 7 months ago | Rohit Sinha
    org.apache.spark.SparkException: Could not find HeartbeatReceiver or it has been stopped.

  1. Nikolay Rybak 1 times, last 1 month ago
  2. tyson925 1 times, last 7 months ago
1 unregistered visitors
Not finding the right solution?
Take a tour to get the most out of Samebug.

Tired of useless tips?

Automated exception search integrated into your IDE

Root Cause Analysis

  1. org.apache.spark.SparkException

    Could not find CoarseGrainedScheduler.

    at org.apache.spark.rpc.netty.Dispatcher.postMessage()
  2. org.apache.spark
    RpcEndpointRef.askWithRetry
    1. org.apache.spark.rpc.netty.Dispatcher.postMessage(Dispatcher.scala:152)
    2. org.apache.spark.rpc.netty.Dispatcher.postLocalMessage(Dispatcher.scala:127)
    3. org.apache.spark.rpc.netty.NettyRpcEnv.ask(NettyRpcEnv.scala:225)
    4. org.apache.spark.rpc.netty.NettyRpcEndpointRef.ask(NettyRpcEnv.scala:508)
    5. org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:101)
    6. org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:78)
    6 frames
  3. Spark
    MesosCoarseGrainedSchedulerBackend.statusUpdate
    1. org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.removeExecutor(CoarseGrainedSchedulerBackend.scala:418)
    2. org.apache.spark.scheduler.cluster.mesos.MesosCoarseGrainedSchedulerBackend.executorTerminated(MesosCoarseGrainedSchedulerBackend.scala:596)
    3. org.apache.spark.scheduler.cluster.mesos.MesosCoarseGrainedSchedulerBackend.statusUpdate(MesosCoarseGrainedSchedulerBackend.scala:533)
    3 frames