org.apache.spark.SparkException

Job aborted due to stage failure: Task 1 in stage 259.0 failed 1 times, most recent failure: Lost task 1.0 in stage 259.0 (TID 859, localhost): org.apache.spark.api.python.PythonException: Traceback (most recent call last): Why do I get the error on count()? Trace: Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.collectAndServe. : org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 1.0 failed 1 times, most recent failure: Lost task 1.0 in stage 1.0 (TID 2, localhost): org.apache.spark.api.python.PythonException: Traceback (most recent call last): File "/databricks/spark/python/pyspark/worker.py", line 172, in main process() File "/databricks/spark/python/pyspark/worker.py", line 167, in process serializer.dump_stream(func(split_index, iterator), outfile) File "/databricks/spark/python/pyspark/rdd.py", line 2371, in pipeline_func return func(split, prev_func(split, iterator)) File "/databricks/spark/python/pyspark/rdd.py", line 2371, in pipeline_func return func(split, prev_func(split, iterator)) File "/databricks/spark/python/pyspark/rdd.py", line 2371, in pipeline_func return func(split, prev_func(split, iterator)) File "/databricks/spark/python/pyspark/rdd.py", line 317, in func return f(iterator) File "/databricks/spark/python/pyspark/rdd.py", line 1008, in return self.mapPartitions(lambda i: [sum(1 for _ in i)]).sum() File "/databricks/spark/python/pyspark/rdd.py", line 1008, in return self.mapPartitions(lambda i: [sum(1 for _ in i)]).sum() File "", line 3, in IndexError: list index out of range


Solutions on the web2485

Solution icon of stackoverflow
via Stack Overflow by Tronald Dump
, 6 months ago
Job aborted due to stage failure: Task 1 in stage 259.0 failed 1 times, most recent failure: Lost task 1.0 in stage 259.0 (TID 859, localhost): org.apache.spark.api.python.PythonException: Traceback (most recent call last): Why do I get the error on

Solution icon of stackoverflow
Job aborted due to stage failure: Task 1 in stage 208.0 failed 1 times, most recent failure: Lost task 1.0 in stage 208.0 (TID 11930, localhost): org.apache.spark.api.python.PythonException: Traceback (most recent call last): File "/databricks

Solution icon of stackoverflow
Job aborted due to stage failure: Task 0 in stage 25.0 failed 1 times, most recent failure: Lost task 0.0 in stage 25.0 (TID 30, localhost): org.apache.spark.api.python.PythonException: Traceback (most recent call last): File "/databricks/spark

Solution icon of stackoverflow
Job aborted due to stage failure: Task 0 in stage 2.0 failed 1 times, most recent failure: Lost task 0.0 in stage 2.0 (TID 2, localhost): org.apache.spark.api.python.PythonException: Traceback (most recent call last): File "/databricks/spark

Solution icon of stackoverflow
via Stack Overflow by John Constantine
, 11 months ago
Job aborted due to stage failure: Task 4 in stage 24.0 failed 1 times, most recent failure: Lost task 4.0 in stage 24.0 (TID 76, localhost): org.apache.spark.api.python.PythonException: Traceback (most recent call last): File "/databricks/spark

Solution icon of github
Job aborted due to stage failure: Task 44 in stage 1.0 failed 4 times, most recent failure: Lost task 44.3 in stage 1.0 (TID 96, 172.16.10.54): org.apache.spark.api.python.PythonException: Traceback (most recent call last): File "/opt/mapr/spark

Solution icon of stackoverflow
Job aborted due to stage failure: Task 0 in stage 78.0 failed 1 times, most recent failure: Lost task 0.0 in stage 78.0 (TID 90, localhost): org.apache.spark.api.python.PythonException: Traceback (most recent call last): File "/opt/spark

Solution icon of stackoverflow
Job aborted due to stage failure: Task 6 in stage 688.0 failed 4 times, most recent failure: Lost task 6.3 in stage 688.0 (TID 8308, 10.179.246.224): org.apache.spark.api.python.PythonException: Traceback (most recent call last): File "/home

Solution icon of apache
via spark-issues by Xiangrui Meng (JIRA), 1 year ago
Job aborted due to stage failure: Task 0 in stage 1.0 failed 1 times, most recent failure: Lost task 0.0 in stage 1.0 (TID 2, localhost): org.apache.spark.api.python.PythonException: Traceback (most recent call last): File "/Users/meng/src/spark

Solution icon of apache
via spark-issues by Xiangrui Meng (JIRA), 1 year ago
Job aborted due to stage failure: Task 0 in stage 1.0 failed 1 times, most recent failure: Lost task 0.0 in stage 1.0 (TID 2, localhost): org.apache.spark.api.python.PythonException: Traceback (most recent call last): File "/Users/meng/src/spark

Stack trace

  • org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 259.0 failed 1 times, most recent failure: Lost task 1.0 in stage 259.0 (TID 859, localhost): org.apache.spark.api.python.PythonException: Traceback (most recent call last): Why do I get the error on count()? Trace: Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.collectAndServe. : org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 1.0 failed 1 times, most recent failure: Lost task 1.0 in stage 1.0 (TID 2, localhost): org.apache.spark.api.python.PythonException: Traceback (most recent call last): File "/databricks/spark/python/pyspark/worker.py", line 172, in main process() File "/databricks/spark/python/pyspark/worker.py", line 167, in process serializer.dump_stream(func(split_index, iterator), outfile) File "/databricks/spark/python/pyspark/rdd.py", line 2371, in pipeline_func return func(split, prev_func(split, iterator)) File "/databricks/spark/python/pyspark/rdd.py", line 2371, in pipeline_func return func(split, prev_func(split, iterator)) File "/databricks/spark/python/pyspark/rdd.py", line 2371, in pipeline_func return func(split, prev_func(split, iterator)) File "/databricks/spark/python/pyspark/rdd.py", line 317, in func return f(iterator) File "/databricks/spark/python/pyspark/rdd.py", line 1008, in return self.mapPartitions(lambda i: [sum(1 for _ in i)]).sum() File "/databricks/spark/python/pyspark/rdd.py", line 1008, in return self.mapPartitions(lambda i: [sum(1 for _ in i)]).sum() File "", line 3, in IndexError: list index out of range at org.apache.spark.api.python.PythonRunner$$anon$1.read(PythonRDD.scala:193) at org.apache.spark.api.python.PythonRunner$$anon$1.<init>(PythonRDD.scala:234) at org.apache.spark.api.python.PythonRunner.compute(PythonRDD.scala:152) at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:63) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319) at org.apache.spark.rdd.RDD.iterator(RDD.scala:283) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70) at org.apache.spark.scheduler.Task.run(Task.scala:86) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:314) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1454) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1442) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1441) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48) at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1441) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:811) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:811) at scala.Option.foreach(Option.scala:257) at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:811) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1667) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1622) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1611) at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48) at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:632) at org.apache.spark.SparkContext.runJob(SparkContext.scala:1891) at org.apache.spark.SparkContext.runJob(SparkContext.scala:1904) at org.apache.spark.SparkContext.runJob(SparkContext.scala:1917) at org.apache.spark.SparkContext.runJob(SparkContext.scala:1931) at org.apache.spark.rdd.RDD$$anonfun$collect$1.apply(RDD.scala:912) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112) at org.apache.spark.rdd.RDD.withScope(RDD.scala:358) at org.apache.spark.rdd.RDD.collect(RDD.scala:911) at org.apache.spark.api.python.PythonRDD$.collectAndServe(PythonRDD.scala:453) at org.apache.spark.api.python.PythonRDD.collectAndServe(PythonRDD.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:237) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) at py4j.Gateway.invoke(Gateway.java:280) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) at py4j.commands.CallCommand.execute(CallCommand.java:79) at py4j.GatewayConnection.run(GatewayConnection.java:214) at java.lang.Thread.run(Thread.java:745) Caused by: org.apache.spark.api.python.PythonException: Traceback (most recent call last): File "/databricks/spark/python/pyspark/worker.py", line 172, in main process() File "/databricks/spark/python/pyspark/worker.py", line 167, in process serializer.dump_stream(func(split_index, iterator), outfile) File "/databricks/spark/python/pyspark/rdd.py", line 2371, in pipeline_func return func(split, prev_func(split, iterator)) File "/databricks/spark/python/pyspark/rdd.py", line 2371, in pipeline_func return func(split, prev_func(split, iterator)) File "/databricks/spark/python/pyspark/rdd.py", line 2371, in pipeline_func return func(split, prev_func(split, iterator)) File "/databricks/spark/python/pyspark/rdd.py", line 317, in func return f(iterator) File "/databricks/spark/python/pyspark/rdd.py", line 1008, in return self.mapPartitions(lambda i: [sum(1 for _ in i)]).sum() File "/databricks/spark/python/pyspark/rdd.py", line 1008, in return self.mapPartitions(lambda i: [sum(1 for _ in i)]).sum() File "", line 3, in IndexError: list index out of range at org.apache.spark.api.python.PythonRunner$$anon$1.read(PythonRDD.scala:193) at org.apache.spark.api.python.PythonRunner$$anon$1.<init>(PythonRDD.scala:234) at org.apache.spark.api.python.PythonRunner.compute(PythonRDD.scala:152) at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:63) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319) at org.apache.spark.rdd.RDD.iterator(RDD.scala:283) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70) at org.apache.spark.scheduler.Task.run(Task.scala:86) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:314) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) ... 1 more

Write tip

You have a different solution? A short tip here would help you and many other users who saw this issue last week.

Users with the same issue

You are the first who have seen this exception. Write a tip to help other users and build your expert profile.