org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 30.0 failed 4 times, most recent failure: Lost task 0.3 in stage 30.0 (TID 52, ph-hdp-prd-dn02): org.apache.spark.api.python.PythonException: Traceback (most recent call last): File "/data/0/yarn/nm/usercache/phanalytics-test/appcache/application_1474532589728_2983/container_e203_1474532589728_2983_01_000014/pyspark.zip/pyspark/worker.py", line 172, in main process() File "/data/0/yarn/nm/usercache/analytics-test/appcache/application_1474532589728_2983/container_e203_1474532589728_2983_01_000014/pyspark.zip/pyspark/worker.py", line 167, in process serializer.dump_stream(func(split_index, iterator), outfile) File "/usr/local/spark-latest/python/pyspark/rdd.py", line 2371, in pipeline_func File "/usr/local/spark-latest/python/pyspark/rdd.py", line 2371, in pipeline_func File "/usr/local/spark-latest/python/pyspark/rdd.py", line 317, in func File "/usr/local/spark-latest/python/pyspark/rdd.py", line 1792, in combineLocally File "/data/0/yarn/nm/usercache/phanalytics-test/appcache/application_1474532589728_2983/container_e203_1474532589728_2983_01_000014/pyspark.zip/pyspark/shuffle.py", line 238, in mergeValues d[k] = comb(d[k], v) if k in d else creator(v) File "<ipython-input-11-ec09929e01e4>", line 6, in <lambda> TypeError: 'int' object is not callable

Stack Overflow | Edamame | 2 months ago
  1. 0

    pyspark: sort in reduceByKey error: in <lambda> TypeError: 'int' object is not callable

    Stack Overflow | 2 months ago | Edamame
    org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 30.0 failed 4 times, most recent failure: Lost task 0.3 in stage 30.0 (TID 52, ph-hdp-prd-dn02): org.apache.spark.api.python.PythonException: Traceback (most recent call last): File "/data/0/yarn/nm/usercache/phanalytics-test/appcache/application_1474532589728_2983/container_e203_1474532589728_2983_01_000014/pyspark.zip/pyspark/worker.py", line 172, in main process() File "/data/0/yarn/nm/usercache/analytics-test/appcache/application_1474532589728_2983/container_e203_1474532589728_2983_01_000014/pyspark.zip/pyspark/worker.py", line 167, in process serializer.dump_stream(func(split_index, iterator), outfile) File "/usr/local/spark-latest/python/pyspark/rdd.py", line 2371, in pipeline_func File "/usr/local/spark-latest/python/pyspark/rdd.py", line 2371, in pipeline_func File "/usr/local/spark-latest/python/pyspark/rdd.py", line 317, in func File "/usr/local/spark-latest/python/pyspark/rdd.py", line 1792, in combineLocally File "/data/0/yarn/nm/usercache/phanalytics-test/appcache/application_1474532589728_2983/container_e203_1474532589728_2983_01_000014/pyspark.zip/pyspark/shuffle.py", line 238, in mergeValues d[k] = comb(d[k], v) if k in d else creator(v) File "<ipython-input-11-ec09929e01e4>", line 6, in <lambda> TypeError: 'int' object is not callable
  2. 0

    Py4Java: ImportError: No module named numpy when running Python shell for Apache Spark

    Stack Overflow | 2 years ago
    org.apache.spark.SparkException: Job aborted due to stage failure: Task 3 in stage 0.0 failed 1 times, most recent failure: Lost task 3.0 in stage 0.0 (TID 3, localhost): org.apache.spark.api.python.PythonException: Traceback (most recent call last): File "/Users/m/workspace/spark-1.2.0-bin-hadoop2.4/python/pyspark/worker.py", line 90, in main command = pickleSer._read_with_length(infile) File "/Users/m/workspace/spark-1.2.0-bin-hadoop2.4/python/pyspark/serializers.py", line 151, in _read_with_length return self.loads(obj) File "/Users/m/workspace/spark-1.2.0-bin-hadoop2.4/python/pyspark/serializers.py", line 396, in loads return cPickle.loads(obj) File "/Users/m/workspace/spark-1.2.0-bin-hadoop2.4/python/pyspark/mllib/__init__.py", line 24, in <module> import numpy ImportError: No module named numpy
  3. 0

    Why does Apache PySpark top() fail when the RDD contains a user defined class?

    Stack Overflow | 2 years ago | user3279453
    org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 49.0 failed 1 times, most recent failure: Lost task 1.0 in stage 49.0 (TID 99, localhost): org.apache.spark.api.python.PythonException: Traceback (most recent call last): File "C:\Programs\Apache\Spark\spark-1.2.0-bin-hadoop2.4\python\pyspark\worker.py", line 107, in main process() File "C:\Programs\Apache\Spark\spark-1.2.0-bin-hadoop2.4\python\pyspark\worker.py", line 98, in process serializer.dump_stream(func(split_index, iterator), outfile) File "C:\Programs\Apache\Spark\spark-1.2.0-bin-hadoop2.4\python\pyspark\serializers.py", line 231, in dump_stream bytes = self.serializer.dumps(vs) File "C:\Programs\Apache\Spark\spark-1.2.0-bin-hadoop2.4\python\pyspark\serializers.py", line 393, in dumps return cPickle.dumps(obj, 2) PicklingError: Can't pickle <class '__main__.TestClass'>: attribute lookup __main__.TestClass failed
  4. Speed up your debug routine!

    Automated exception search integrated into your IDE

  5. 0

    Spark Streaming Kafka - Job always quits when RDD contains an actual message

    Stack Overflow | 2 years ago | CruLPlay
    org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 4.0 failed 1 times, most recent failure: Lost task 1.0 in stage 4.0 (TID 9, localhost): org.apache.spark.util.TaskCompletionListenerException
  6. 0

    Py4Java: ImportError: No module named numpy when running Python shell for Apache Spark | Solutions for enthusiast and professional programmers

    fatal-errors.com | 1 year ago
    org.apache.spark.SparkException: Job aborted due to stage failure: Task 3 in stage 0.0 failed 1 times, most recent failure: Lost task 3.0 in stage 0.0 (TID 3, localhost): org.apache.spark.api.python.PythonException: Traceback (most recent call last): File "/Users/m/workspace/spark-1.2.0-bin-hadoop2.4/python/pyspark/worker.py", line 90, in main command = pickleSer._read_with_length(infile) File "/Users/m/workspace/spark-1.2.0-bin-hadoop2.4/python/pyspark/serializers.py", line 151, in _read_with_length return self.loads(obj) File "/Users/m/workspace/spark-1.2.0-bin-hadoop2.4/python/pyspark/serializers.py", line 396, in loads return cPickle.loads(obj) File "/Users/m/workspace/spark-1.2.0-bin-hadoop2.4/python/pyspark/mllib/__init__.py", line 24, in <module> import numpy ImportError: No module named numpy

  1. tyson925 2 times, last 7 months ago
5 unregistered visitors
Not finding the right solution?
Take a tour to get the most out of Samebug.

Tired of useless tips?

Automated exception search integrated into your IDE

Root Cause Analysis

  1. org.apache.spark.SparkException

    Job aborted due to stage failure: Task 0 in stage 30.0 failed 4 times, most recent failure: Lost task 0.3 in stage 30.0 (TID 52, ph-hdp-prd-dn02): org.apache.spark.api.python.PythonException: Traceback (most recent call last): File "/data/0/yarn/nm/usercache/phanalytics-test/appcache/application_1474532589728_2983/container_e203_1474532589728_2983_01_000014/pyspark.zip/pyspark/worker.py", line 172, in main process() File "/data/0/yarn/nm/usercache/analytics-test/appcache/application_1474532589728_2983/container_e203_1474532589728_2983_01_000014/pyspark.zip/pyspark/worker.py", line 167, in process serializer.dump_stream(func(split_index, iterator), outfile) File "/usr/local/spark-latest/python/pyspark/rdd.py", line 2371, in pipeline_func File "/usr/local/spark-latest/python/pyspark/rdd.py", line 2371, in pipeline_func File "/usr/local/spark-latest/python/pyspark/rdd.py", line 317, in func File "/usr/local/spark-latest/python/pyspark/rdd.py", line 1792, in combineLocally File "/data/0/yarn/nm/usercache/phanalytics-test/appcache/application_1474532589728_2983/container_e203_1474532589728_2983_01_000014/pyspark.zip/pyspark/shuffle.py", line 238, in mergeValues d[k] = comb(d[k], v) if k in d else creator(v) File "<ipython-input-11-ec09929e01e4>", line 6, in <lambda> TypeError: 'int' object is not callable

    at org.apache.spark.api.python.PythonRunner$$anon$1.read()
  2. Spark
    Executor$TaskRunner.run
    1. org.apache.spark.api.python.PythonRunner$$anon$1.read(PythonRDD.scala:193)
    2. org.apache.spark.api.python.PythonRunner$$anon$1.<init>(PythonRDD.scala:234)
    3. org.apache.spark.api.python.PythonRunner.compute(PythonRDD.scala:152)
    4. org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:63)
    5. org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319)
    6. org.apache.spark.rdd.RDD.iterator(RDD.scala:283)
    7. org.apache.spark.api.python.PairwiseRDD.compute(PythonRDD.scala:390)
    8. org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319)
    9. org.apache.spark.rdd.RDD.iterator(RDD.scala:283)
    10. org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:79)
    11. org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:47)
    12. org.apache.spark.scheduler.Task.run(Task.scala:85)
    13. org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
    13 frames
  3. Java RT
    Thread.run
    1. java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    2. java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    3. java.lang.Thread.run(Thread.java:745)
    3 frames