org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 98.0 failed 1 times, most recent failure: Lost task 0.0 in stage 98.0 (TID 338, localhost): org.apache.spark.SparkException: Failed to execute user defined function($anonfun$3: (struct) => vector)

Stack Overflow | Praveen | 1 month ago
tip
Do you find the tips below useful? Click on the to mark them and say thanks to poroszd . Or join the community to write better ones.
  1. 0
    samebug tip
    You should use java.sql.Timestamp or Date to map BsonDateTime from mongodb.
  2. 0

    GitHub comment 225#249287425

    GitHub | 6 months ago | kevinushey
    org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 12.0 failed 1 times, most recent failure: Lost task 0.0 in stage 12.0 (TID 12, localhost): java.lang.ClassCastException: java.lang.Double cannot be cast to org.apache.spark.ml.linalg.Vector
  3. 0

    Working on 1.6.2, broken on 2.0 {code} select * from logs.a where year=2016 and month=9 and day=14 limit 100 {code} {code} java.lang.UnsupportedOperationException: org.apache.parquet.column.values.dictionary.PlainValuesDictionary$PlainBinaryDictionary at org.apache.parquet.column.Dictionary.decodeToInt(Dictionary.java:48) at org.apache.spark.sql.execution.vectorized.OnHeapColumnVector.getInt(OnHeapColumnVector.java:233) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown Source) at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:370) at org.apache.spark.sql.execution.SparkPlan$$anonfun$4.apply(SparkPlan.scala:246) at org.apache.spark.sql.execution.SparkPlan$$anonfun$4.apply(SparkPlan.scala:240) at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$24.apply(RDD.scala:803) at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$24.apply(RDD.scala:803) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319) at org.apache.spark.rdd.RDD.iterator(RDD.scala:283) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70) at org.apache.spark.scheduler.Task.run(Task.scala:86) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) {code}

    Apache's JIRA Issue Tracker | 6 months ago | Egor Pahomov
    org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 1.0 failed 4 times, most recent failure: Lost task 0.3 in stage 1.0 (TID 4, ip-10-1-101-18.ec2.internal): java.lang.UnsupportedOperationException: org.apache.parquet.column.values.dictionary.PlainValuesDictionary$PlainBinaryDictionary
  4. Speed up your debug routine!

    Automated exception search integrated into your IDE

  5. 0

    GitHub comment 951#288760587

    GitHub | 5 days ago | cvjones17
    org.apache.spark.SparkException: Job aborted due to stage failure: Task 4 in stage 20.0 failed 4 times, most recent failure: Lost task 4.3 in stage 20.0 (TID 201, 172.31.30.96, executor 0): java.lang.IndexOutOfBoundsException: 1
  6. 0

    scala.MatchError on reading metadata

    GitHub | 2 months ago | drudim
    org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 2.0 failed 1 times, most recent failure: Lost task 0.0 in stage 2.0 (TID 2, localhost): scala.MatchError: true (of class java.lang.Boolean)

    Not finding the right solution?
    Take a tour to get the most out of Samebug.

    Tired of useless tips?

    Automated exception search integrated into your IDE

    Root Cause Analysis

    1. org.apache.spark.SparkException

      Values to assemble cannot be null.

      at org.apache.spark.ml.feature.VectorAssembler$$anonfun$assemble$1.apply()
    2. Spark Project ML Library
      VectorAssembler$$anonfun$assemble$1.apply
      1. org.apache.spark.ml.feature.VectorAssembler$$anonfun$assemble$1.apply(VectorAssembler.scala:160)
      2. org.apache.spark.ml.feature.VectorAssembler$$anonfun$assemble$1.apply(VectorAssembler.scala:143)
      2 frames
    3. Scala
      WrappedArray.foreach
      1. scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
      2. scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:35)
      2 frames
    4. Spark Project ML Library
      VectorAssembler$$anonfun$3.apply
      1. org.apache.spark.ml.feature.VectorAssembler$.assemble(VectorAssembler.scala:143)
      2. org.apache.spark.ml.feature.VectorAssembler$$anonfun$3.apply(VectorAssembler.scala:99)
      3. org.apache.spark.ml.feature.VectorAssembler$$anonfun$3.apply(VectorAssembler.scala:98)
      3 frames
    5. Spark Project Catalyst
      GeneratedClass$GeneratedIterator.processNext
      1. org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown Source)
      1 frame
    6. Spark Project SQL
      SparkPlan$$anonfun$4.apply
      1. org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
      2. org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:370)
      3. org.apache.spark.sql.execution.SparkPlan$$anonfun$4.apply(SparkPlan.scala:246)
      4. org.apache.spark.sql.execution.SparkPlan$$anonfun$4.apply(SparkPlan.scala:240)
      4 frames
    7. Spark
      Executor$TaskRunner.run
      1. org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$24.apply(RDD.scala:803)
      2. org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$24.apply(RDD.scala:803)
      3. org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
      4. org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319)
      5. org.apache.spark.rdd.RDD.iterator(RDD.scala:283)
      6. org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70)
      7. org.apache.spark.scheduler.Task.run(Task.scala:86)
      8. org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
      8 frames
    8. Java RT
      Thread.run
      1. java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
      2. java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
      3. java.lang.Thread.run(Unknown Source)
      3 frames