java.lang.IllegalStateException: SparkContext has been shutdown

JIRA | Avkash Chauhan | 4 months ago
  1. 0

    Yesterday we could repro this problem locally with CDH 5.7.1 and SW 1.6.5 with the code below: l = [('Alice', 1)] df = sqlContext.createDataFrame(l) df.count() import pysparkling hc = pysparkling.H2OContext(sc).start() df.count() After the problem happens the H2O server is still accessible and working, showing h2o node status correctly while all h2O nodes are healthy. Also calling the following shows the same problem: >> hc.as_h2o_frame(df) Command line: bin/pysparkling --num-executors 2 --executor-memory 6g --driver-memory 6g --master yarn-client --conf spark.dynamicAllocation.enabled=false --conf spark.ext.h2o.repl.enabled=false ----------------------- I will give you a simple test case: <before you have h20 context initialized, you can do all actions on a spark dataframe> however <after you have just started a h20 context H20Context(sc).start( ), the dataframe gets destroyed ( this is even when I have not used H2O context ). The line that mentions <<`java.lang.IllegalStateException: SparkContext has been shutdown`>> is actually an aftermath. One more thing, we have tested H2O on a single node cluster with versions of Spark 1.6.0 and 1.6.1 and they work fine. The problem is H2O fails when it runs on CDH-5.7.1. File "/home/cdhadmin/sparkling-water-1.6.5/py/dist/h2o_pysparkling_1.6-1.6.5-py2.7.egg/pysparkling/context.py", line 175, in as_h2o_frame File "/home/cdhadmin/sparkling-water-1.6.5/py/dist/h2o_pysparkling_1.6-1.6.5-py2.7.egg/pysparkling/utils.py", line 40, in _as_h2o_frame_from_dataframe File "/vol1/cloudera/parcels/CDH-5.7.1-1.cdh5.7.1.p0.11/lib/spark/python/lib/pyspark.zip/pyspark/sql/dataframe.py", line 269, in count File "/vol1/cloudera/parcels/CDH-5.7.1-1.cdh5.7.1.p0.11/lib/spark/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py", line 813, in __call__ File "/vol1/cloudera/parcels/CDH-5.7.1-1.cdh5.7.1.p0.11/lib/spark/python/lib/pyspark.zip/pyspark/sql/utils.py", line 45, in deco File "/vol1/cloudera/parcels/CDH-5.7.1-1.cdh5.7.1.p0.11/lib/spark/python/lib/py4j-0.9-src.zip/py4j/protocol.py", line 308, in get_return_value py4j.protocol.Py4JJavaError: An error occurred while calling o90.count. : java.lang.IllegalStateException: SparkContext has been shutdown at org.apache.spark.SparkContext.runJob(SparkContext.scala:1835) at org.apache.spark.SparkContext.runJob(SparkContext.scala:1856) at org.apache.spark.SparkContext.runJob(SparkContext.scala:1869) at org.apache.spark.SparkContext.runJob(SparkContext.scala:1940) at org.apache.spark.rdd.RDD$$anonfun$collect$1.apply(RDD.scala:927) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111) at org.apache.spark.rdd.RDD.withScope(RDD.scala:316) at org.apache.spark.rdd.RDD.collect(RDD.scala:926) at org.apache.spark.sql.execution.SparkPlan.executeCollect(SparkPlan.scala:166) at org.apache.spark.sql.execution.SparkPlan.executeCollectPublic(SparkPlan.scala:174) at org.apache.spark.sql.DataFrame$$anonfun$org$apache$spark$sql$DataFrame$$execute$1$1.apply(DataFrame.scala:1499) at org.apache.spark.sql.DataFrame$$anonfun$org$apache$spark$sql$DataFrame$$execute$1$1.apply(DataFrame.scala:1499) at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:53) at org.apache.spark.sql.DataFrame.withNewExecutionId(DataFrame.scala:2086) at org.apache.spark.sql.DataFrame.org$apache$spark$sql$DataFrame$$execute$1(DataFrame.scala:1498) at org.apache.spark.sql.DataFrame.org$apache$spark$sql$DataFrame$$collect(DataFrame.scala:1505) at org.apache.spark.sql.DataFrame$$anonfun$count$1.apply(DataFrame.scala:1515) at org.apache.spark.sql.DataFrame$$anonfun$count$1.apply(DataFrame.scala:1514) at org.apache.spark.sql.DataFrame.withCallback(DataFrame.scala:2099) at org.apache.spark.sql.DataFrame.count(DataFrame.scala:1514) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:381) at py4j.Gateway.invoke(Gateway.java:259) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133) at py4j.commands.CallCommand.execute(CallCommand.java:79) at py4j.GatewayConnection.run(GatewayConnection.java:209) at java.lang.Thread.run(Thread.java:745)

    JIRA | 4 months ago | Avkash Chauhan
    java.lang.IllegalStateException: SparkContext has been shutdown
  2. 0

    Running spark job at GoogleDataproc cluster and getting YARN ContainerExitStatus as INVALID

    Stack Overflow | 10 months ago | Saulo Ricci
    java.lang.IllegalStateException: SparkContext has been shutdownacross some threads, because in that point I guess YARN resource manager already finished the resources. This is what I get in my spark application log: java.lang.IllegalStateException: SparkContext has been shutdown
  3. 0

    Ardoris

    wordpress.com | 8 months ago
    java.lang.IllegalStateException: SparkContext has been shutdown
  4. Speed up your debug routine!

    Automated exception search integrated into your IDE

  5. 0

    ./oryx-run.sh batch SparkContext has been shutdown

    GitHub | 8 months ago | pinguo-xudianyang
    java.lang.IllegalStateException: SparkContext has been shutdown

    Not finding the right solution?
    Take a tour to get the most out of Samebug.

    Tired of useless tips?

    Automated exception search integrated into your IDE

    Root Cause Analysis

    1. java.lang.IllegalStateException

      SparkContext has been shutdown

      at org.apache.spark.SparkContext.runJob()
    2. Spark
      RDD.collect
      1. org.apache.spark.SparkContext.runJob(SparkContext.scala:1835)
      2. org.apache.spark.SparkContext.runJob(SparkContext.scala:1856)
      3. org.apache.spark.SparkContext.runJob(SparkContext.scala:1869)
      4. org.apache.spark.SparkContext.runJob(SparkContext.scala:1940)
      5. org.apache.spark.rdd.RDD$$anonfun$collect$1.apply(RDD.scala:927)
      6. org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
      7. org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111)
      8. org.apache.spark.rdd.RDD.withScope(RDD.scala:316)
      9. org.apache.spark.rdd.RDD.collect(RDD.scala:926)
      9 frames
    3. Spark Project SQL
      DataFrame.count
      1. org.apache.spark.sql.execution.SparkPlan.executeCollect(SparkPlan.scala:166)
      2. org.apache.spark.sql.execution.SparkPlan.executeCollectPublic(SparkPlan.scala:174)
      3. org.apache.spark.sql.DataFrame$$anonfun$org$apache$spark$sql$DataFrame$$execute$1$1.apply(DataFrame.scala:1499)
      4. org.apache.spark.sql.DataFrame$$anonfun$org$apache$spark$sql$DataFrame$$execute$1$1.apply(DataFrame.scala:1499)
      5. org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:53)
      6. org.apache.spark.sql.DataFrame.withNewExecutionId(DataFrame.scala:2086)
      7. org.apache.spark.sql.DataFrame.org$apache$spark$sql$DataFrame$$execute$1(DataFrame.scala:1498)
      8. org.apache.spark.sql.DataFrame.org$apache$spark$sql$DataFrame$$collect(DataFrame.scala:1505)
      9. org.apache.spark.sql.DataFrame$$anonfun$count$1.apply(DataFrame.scala:1515)
      10. org.apache.spark.sql.DataFrame$$anonfun$count$1.apply(DataFrame.scala:1514)
      11. org.apache.spark.sql.DataFrame.withCallback(DataFrame.scala:2099)
      12. org.apache.spark.sql.DataFrame.count(DataFrame.scala:1514)
      12 frames
    4. Java RT
      Method.invoke
      1. sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      2. sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
      3. sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
      4. java.lang.reflect.Method.invoke(Method.java:498)
      4 frames
    5. Py4J
      GatewayConnection.run
      1. py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231)
      2. py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:381)
      3. py4j.Gateway.invoke(Gateway.java:259)
      4. py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133)
      5. py4j.commands.CallCommand.execute(CallCommand.java:79)
      6. py4j.GatewayConnection.run(GatewayConnection.java:209)
      6 frames
    6. Java RT
      Thread.run
      1. java.lang.Thread.run(Thread.java:745)
      1 frame