org.apache.parquet.io.ParquetEncodingException: /Users/heuermh/working/adam/sorted.adam/part-r-00000.gz.parquet invalid: all the files must be contained in the root sorted.adam

GitHub | heuermh | 4 months ago
tip
Do you find the tips below useful? Click on the to mark them and say thanks to poroszd . Or join the community to write better ones.
  1. 0

    GitHub comment 1340#270263389

    GitHub | 4 months ago | heuermh
    org.apache.parquet.io.ParquetEncodingException: /Users/heuermh/working/adam/sorted.adam/part-r-00000.gz.parquet invalid: all the files must be contained in the root sorted.adam
  2. 0
    samebug tip
    I was missing a partitioning column because I did not specify the "basePath" option on read
  3. Speed up your debug routine!

    Automated exception search integrated into your IDE

  4. 0

    ClassCastException AvroKey cannot be cast to java.lang.Void

    Stack Overflow | 3 weeks ago | shashank
    org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 2.0 failed 1 times, most recent failure: Lost task 0.0 in stage 2.0 (TID 2, localhost): java.lang.ClassCastException: org.apache.avro.mapred.AvroKey cannot be cast to java.lang.Void

    Not finding the right solution?
    Take a tour to get the most out of Samebug.

    Tired of useless tips?

    Automated exception search integrated into your IDE

    Root Cause Analysis

    1. org.apache.parquet.io.ParquetEncodingException

      /Users/heuermh/working/adam/sorted.adam/part-r-00000.gz.parquet invalid: all the files must be contained in the root sorted.adam

      at org.apache.parquet.hadoop.ParquetFileWriter.mergeFooters()
    2. org.apache.parquet
      ParquetOutputCommitter.commitJob
      1. org.apache.parquet.hadoop.ParquetFileWriter.mergeFooters(ParquetFileWriter.java:444)
      2. org.apache.parquet.hadoop.ParquetFileWriter.writeMetadataFile(ParquetFileWriter.java:420)
      3. org.apache.parquet.hadoop.ParquetOutputCommitter.writeMetaDataFile(ParquetOutputCommitter.java:58)
      4. org.apache.parquet.hadoop.ParquetOutputCommitter.commitJob(ParquetOutputCommitter.java:48)
      4 frames
    3. Spark
      InstrumentedPairRDDFunctions.saveAsNewAPIHadoopFile
      1. org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1.apply$mcV$sp(PairRDDFunctions.scala:1145)
      2. org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1.apply(PairRDDFunctions.scala:1074)
      3. org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1.apply(PairRDDFunctions.scala:1074)
      4. org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
      5. org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111)
      6. org.apache.spark.rdd.RDD.withScope(RDD.scala:316)
      7. org.apache.spark.rdd.PairRDDFunctions.saveAsNewAPIHadoopDataset(PairRDDFunctions.scala:1074)
      8. org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopFile$2.apply$mcV$sp(PairRDDFunctions.scala:994)
      9. org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopFile$2.apply(PairRDDFunctions.scala:985)
      10. org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopFile$2.apply(PairRDDFunctions.scala:985)
      11. org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
      12. org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111)
      13. org.apache.spark.rdd.RDD.withScope(RDD.scala:316)
      14. org.apache.spark.rdd.PairRDDFunctions.saveAsNewAPIHadoopFile(PairRDDFunctions.scala:985)
      15. org.apache.spark.rdd.InstrumentedPairRDDFunctions.saveAsNewAPIHadoopFile(InstrumentedPairRDDFunctions.scala:477)
      15 frames
    4. org.bdgenomics.adam
      ADAMRDDFunctions$$anonfun$saveRddAsParquet$1.apply
      1. org.bdgenomics.adam.rdd.ADAMRDDFunctions$$anonfun$saveRddAsParquet$1.apply$mcV$sp(ADAMRDDFunctions.scala:159)
      2. org.bdgenomics.adam.rdd.ADAMRDDFunctions$$anonfun$saveRddAsParquet$1.apply(ADAMRDDFunctions.scala:143)
      3. org.bdgenomics.adam.rdd.ADAMRDDFunctions$$anonfun$saveRddAsParquet$1.apply(ADAMRDDFunctions.scala:143)
      3 frames
    5. Scala
      Option.fold
      1. scala.Option.fold(Option.scala:157)
      1 frame
    6. Spark
      Timer.time
      1. org.apache.spark.rdd.Timer.time(Timer.scala:48)
      1 frame
    7. org.bdgenomics.adam
      Vcf2ADAM.run
      1. org.bdgenomics.adam.rdd.ADAMRDDFunctions.saveRddAsParquet(ADAMRDDFunctions.scala:143)
      2. org.bdgenomics.adam.rdd.AvroGenomicRDD.saveAsParquet(GenomicRDD.scala:908)
      3. org.bdgenomics.adam.rdd.AvroGenomicRDD.saveAsParquet(GenomicRDD.scala:883)
      4. org.bdgenomics.adam.cli.Vcf2ADAM.run(Vcf2ADAM.scala:74)
      4 frames
    8. org.bdgenomics.utils
      BDGSparkCommand$class.run
      1. org.bdgenomics.utils.cli.BDGSparkCommand$class.run(BDGCommand.scala:55)
      1 frame
    9. org.bdgenomics.adam
      ADAMMain.main
      1. org.bdgenomics.adam.cli.Vcf2ADAM.run(Vcf2ADAM.scala:53)
      2. org.bdgenomics.adam.cli.ADAMMain.apply(ADAMMain.scala:128)
      3. org.bdgenomics.adam.cli.ADAMMain$.main(ADAMMain.scala:68)
      4. org.bdgenomics.adam.cli.ADAMMain.main(ADAMMain.scala)
      4 frames
    10. Java RT
      Method.invoke
      1. sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      2. sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
      3. sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
      4. java.lang.reflect.Method.invoke(Method.java:497)
      4 frames
    11. Spark
      SparkSubmit.main
      1. org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
      2. org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
      3. org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
      4. org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
      5. org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
      5 frames