N2.AGT: org.apache.spark.api.python.PythonException: Traceback (most recent call last): File "/tmp/hadoop/yarn/local/usercache/root/filecache/23/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/worker.py", line 77, in main serializer.dump_stream(func(split_index, iterator), outfile) File "/tmp/hadoop/yarn/local/usercache/root/filecache/23/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py", line 191, in dump_stream self.serializer.dump_stream(self._batched(iterator), stream) File "/tmp/hadoop/yarn/local/usercache/root/filecache/23/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py", line 123, in dump_stream for obj in iterator: File "/tmp/hadoop/yarn/local/usercache/root/filecache/23/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py", line 180, in _batched for item in iterator: File "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py", line 612, in func File "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py", line 36, in f SystemError: unknown opcode org.apache.spark.api.python.PythonRDD$$anon$1.read(PythonRDD.scala:115) org.apache.spark.api.python.PythonRDD$$anon$1.<init>(PythonRDD.scala:145) org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:78) org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) org.apache.spark.rdd.RDD.iterator(RDD.scala:229) org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:111) org.apache.spark.scheduler.Task.run(Task.scala:51) org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:183) java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) java.lang.Thread.run(Thread.java:744) Driver stacktrace: at org.apache.spark.scheduler.DAGScheduler.org $apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1044)

spark-user | Andrew Or | 2 years ago
  1. 0

    Re: pyspark yarn got exception

    spark-user | 2 years ago | Andrew Or
    N2.AGT: org.apache.spark.api.python.PythonException: Traceback (most recent call last): File "/tmp/hadoop/yarn/local/usercache/root/filecache/23/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/worker.py", line 77, in main serializer.dump_stream(func(split_index, iterator), outfile) File "/tmp/hadoop/yarn/local/usercache/root/filecache/23/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py", line 191, in dump_stream self.serializer.dump_stream(self._batched(iterator), stream) File "/tmp/hadoop/yarn/local/usercache/root/filecache/23/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py", line 123, in dump_stream for obj in iterator: File "/tmp/hadoop/yarn/local/usercache/root/filecache/23/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py", line 180, in _batched for item in iterator: File "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py", line 612, in func File "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py", line 36, in f SystemError: unknown opcode org.apache.spark.api.python.PythonRDD$$anon$1.read(PythonRDD.scala:115) org.apache.spark.api.python.PythonRDD$$anon$1.<init>(PythonRDD.scala:145) org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:78) org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) org.apache.spark.rdd.RDD.iterator(RDD.scala:229) org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:111) org.apache.spark.scheduler.Task.run(Task.scala:51) org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:183) java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) java.lang.Thread.run(Thread.java:744) Driver stacktrace: at org.apache.spark.scheduler.DAGScheduler.org $apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1044)

    Root Cause Analysis

    1. N2.AGT

      org.apache.spark.api.python.PythonException: Traceback (most recent call last): File "/tmp/hadoop/yarn/local/usercache/root/filecache/23/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/worker.py", line 77, in main serializer.dump_stream(func(split_index, iterator), outfile) File "/tmp/hadoop/yarn/local/usercache/root/filecache/23/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py", line 191, in dump_stream self.serializer.dump_stream(self._batched(iterator), stream) File "/tmp/hadoop/yarn/local/usercache/root/filecache/23/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py", line 123, in dump_stream for obj in iterator: File "/tmp/hadoop/yarn/local/usercache/root/filecache/23/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py", line 180, in _batched for item in iterator: File "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py", line 612, in func File "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py", line 36, in f SystemError: unknown opcode org.apache.spark.api.python.PythonRDD$$anon$1.read(PythonRDD.scala:115) org.apache.spark.api.python.PythonRDD$$anon$1.<init>(PythonRDD.scala:145) org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:78) org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) org.apache.spark.rdd.RDD.iterator(RDD.scala:229) org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:111) org.apache.spark.scheduler.Task.run(Task.scala:51) org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:183) java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) java.lang.Thread.run(Thread.java:744) Driver stacktrace: at org.apache.spark.scheduler.DAGScheduler.org $apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1044)

      at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply()
    2. Spark
      DAGScheduler$$anonfun$abortStage$1.apply
      1. org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1028)
      2. org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1026)
      2 frames
    3. Scala
      ArrayBuffer.foreach
      1. scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
      2. scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
      2 frames
    4. Spark
      DAGScheduler$$anonfun$handleTaskSetFailed$1.apply
      1. org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1026)
      2. org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:634)
      3. org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:634)
      3 frames
    5. Scala
      Option.foreach
      1. scala.Option.foreach(Option.scala:236)
      1 frame
    6. Spark
      DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse
      1. org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:634)
      2. org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1229)
      2 frames
    7. Akka Actor
      ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec
      1. akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
      2. akka.actor.ActorCell.invoke(ActorCell.scala:456)
      3. akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
      4. akka.dispatch.Mailbox.run(Mailbox.scala:219)
      5. akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
      5 frames
    8. Scala
      ForkJoinTask.doExec
      1. scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
      1 frame