org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 7212.0 failed 4 times, most recent failure: Lost task 0.3 in stage 7212.0 (TID 138449, 208.92.52.88): java.lang.IllegalArgumentException: Column [col2] was not found in schema!

Apache's JIRA Issue Tracker | Dominic Ricard | 1 year ago
  1. 0

    [SPARK-11103] Parquet filters push-down may cause exception when schema merging is turned on - ASF JIRA

    apache.org | 11 months ago
    org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 7212.0 failed 4 times, most recent failure: Lost task 0.3 in stage 7212.0 (TID 138449, 208.92.52.88): java.lang.IllegalArgumentException: Column [col2] was not found in schema!
  2. 0

    When evolving a schema in parquet files, spark properly expose all columns found in the different parquet files but when trying to query the data, it is not possible to apply a filter on a column that is not present in all files. To reproduce: *SQL:* {noformat} create table `table1` STORED AS PARQUET LOCATION 'hdfs://<SERVER>:<PORT>/path/to/table/id=1/' as select 1 as `col1`; create table `table2` STORED AS PARQUET LOCATION 'hdfs://<SERVER>:<PORT>/path/to/table/id=2/' as select 1 as `col1`, 2 as `col2`; create table `table3` USING org.apache.spark.sql.parquet OPTIONS (path "hdfs://<SERVER>:<PORT>/path/to/table"); select col1 from `table3` where col2 = 2; {noformat} The last select will output the following Stack Trace: {noformat} An error occurred when executing the SQL command: select col1 from `table3` where col2 = 2 [Simba][HiveJDBCDriver](500051) ERROR processing query/statement. Error Code: 0, SQL state: TStatus(statusCode:ERROR_STATUS, infoMessages:[*org.apache.hive.service.cli.HiveSQLException:org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 7212.0 failed 4 times, most recent failure: Lost task 0.3 in stage 7212.0 (TID 138449, 208.92.52.88): java.lang.IllegalArgumentException: Column [col2] was not found in schema! at org.apache.parquet.Preconditions.checkArgument(Preconditions.java:55) at org.apache.parquet.filter2.predicate.SchemaCompatibilityValidator.getColumnDescriptor(SchemaCompatibilityValidator.java:190) at org.apache.parquet.filter2.predicate.SchemaCompatibilityValidator.validateColumn(SchemaCompatibilityValidator.java:178) at org.apache.parquet.filter2.predicate.SchemaCompatibilityValidator.validateColumnFilterPredicate(SchemaCompatibilityValidator.java:160) at org.apache.parquet.filter2.predicate.SchemaCompatibilityValidator.visit(SchemaCompatibilityValidator.java:94) at org.apache.parquet.filter2.predicate.SchemaCompatibilityValidator.visit(SchemaCompatibilityValidator.java:59) at org.apache.parquet.filter2.predicate.Operators$Eq.accept(Operators.java:180) at org.apache.parquet.filter2.predicate.SchemaCompatibilityValidator.validate(SchemaCompatibilityValidator.java:64) at org.apache.parquet.filter2.compat.RowGroupFilter.visit(RowGroupFilter.java:59) at org.apache.parquet.filter2.compat.RowGroupFilter.visit(RowGroupFilter.java:40) at org.apache.parquet.filter2.compat.FilterCompat$FilterPredicateCompat.accept(FilterCompat.java:126) at org.apache.parquet.filter2.compat.RowGroupFilter.filterRowGroups(RowGroupFilter.java:46) at org.apache.parquet.hadoop.ParquetRecordReader.initializeInternalReader(ParquetRecordReader.java:160) at org.apache.parquet.hadoop.ParquetRecordReader.initialize(ParquetRecordReader.java:140) at org.apache.spark.rdd.SqlNewHadoopRDD$$anon$1.<init>(SqlNewHadoopRDD.scala:155) at org.apache.spark.rdd.SqlNewHadoopRDD.compute(SqlNewHadoopRDD.scala:120) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297) at org.apache.spark.rdd.RDD.iterator(RDD.scala:264) at org.apache.spark.rdd.UnionRDD.compute(UnionRDD.scala:87) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297) at org.apache.spark.rdd.RDD.iterator(RDD.scala:264) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297) at org.apache.spark.rdd.RDD.iterator(RDD.scala:264) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297) at org.apache.spark.rdd.RDD.iterator(RDD.scala:264) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297) at org.apache.spark.rdd.RDD.iterator(RDD.scala:264) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) at org.apache.spark.scheduler.Task.run(Task.scala:88) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Driver stacktrace::26:25, org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation:runInternal:SparkExecuteStatementOperation.scala:259, org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation:run:SparkExecuteStatementOperation.scala:144, org.apache.hive.service.cli.session.HiveSessionImpl:executeStatementInternal:HiveSessionImpl.java:388, org.apache.hive.service.cli.session.HiveSessionImpl:executeStatement:HiveSessionImpl.java:369, sun.reflect.GeneratedMethodAccessor134:invoke::-1, sun.reflect.DelegatingMethodAccessorImpl:invoke:DelegatingMethodAccessorImpl.java:43, java.lang.reflect.Method:invoke:Method.java:497, org.apache.hive.service.cli.session.HiveSessionProxy:invoke:HiveSessionProxy.java:78, org.apache.hive.service.cli.session.HiveSessionProxy:access$000:HiveSessionProxy.java:36, org.apache.hive.service.cli.session.HiveSessionProxy$1:run:HiveSessionProxy.java:63, java.security.AccessController:doPrivileged:AccessController.java:-2, javax.security.auth.Subject:doAs:Subject.java:422, org.apache.hadoop.security.UserGroupInformation:doAs:UserGroupInformation.java:1628, org.apache.hive.service.cli.session.HiveSessionProxy:invoke:HiveSessionProxy.java:59, com.sun.proxy.$Proxy25:executeStatement::-1, org.apache.hive.service.cli.CLIService:executeStatement:CLIService.java:261, org.apache.hive.service.cli.thrift.ThriftCLIService:ExecuteStatement:ThriftCLIService.java:486, org.apache.hive.service.cli.thrift.TCLIService$Processor$ExecuteStatement:getResult:TCLIService.java:1313, org.apache.hive.service.cli.thrift.TCLIService$Processor$ExecuteStatement:getResult:TCLIService.java:1298, org.apache.thrift.ProcessFunction:process:ProcessFunction.java:39, org.apache.thrift.TBaseProcessor:process:TBaseProcessor.java:39, org.apache.hive.service.auth.TSetIpAddressProcessor:process:TSetIpAddressProcessor.java:56, org.apache.thrift.server.TThreadPoolServer$WorkerProcess:run:TThreadPoolServer.java:285, java.util.concurrent.ThreadPoolExecutor:runWorker:ThreadPoolExecutor.java:1142, java.util.concurrent.ThreadPoolExecutor$Worker:run:ThreadPoolExecutor.java:617, java.lang.Thread:run:Thread.java:745], errorCode:0, errorMessage:org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 7212.0 failed 4 times, most recent failure: Lost task 0.3 in stage 7212.0 (TID 138449, 208.92.52.88): java.lang.IllegalArgumentException: Column [col2] was not found in schema! at org.apache.parquet.Preconditions.checkArgument(Preconditions.java:55) at org.apache.parquet.filter2.predicate.SchemaCompatibilityValidator.getColumnDescriptor(SchemaCompatibilityValidator.java:190) at org.apache.parquet.filter2.predicate.SchemaCompatibilityValidator.validateColumn(SchemaCompatibilityValidator.java:178) at org.apache.parquet.filter2.predicate.SchemaCompatibilityValidator.validateColumnFilterPredicate(SchemaCompatibilityValidator.java:160) at org.apache.parquet.filter2.predicate.SchemaCompatibilityValidator.visit(SchemaCompatibilityValidator.java:94) at org.apache.parquet.filter2.predicate.SchemaCompatibilityValidator.visit(SchemaCompatibilityValidator.java:59) at org.apache.parquet.filter2.predicate.Operators$Eq.accept(Operators.java:180) at org.apache.parquet.filter2.predicate.SchemaCompatibilityValidator.validate(SchemaCompatibilityValidator.java:64) at org.apache.parquet.filter2.compat.RowGroupFilter.visit(RowGroupFilter.java:59) at org.apache.parquet.filter2.compat.RowGroupFilter.visit(RowGroupFilter.java:40) at org.apache.parquet.filter2.compat.FilterCompat$FilterPredicateCompat.accept(FilterCompat.java:126) at org.apache.parquet.filter2.compat.RowGroupFilter.filterRowGroups(RowGroupFilter.java:46) at org.apache.parquet.hadoop.ParquetRecordReader.initializeInternalReader(ParquetRecordReader.java:160) at org.apache.parquet.hadoop.ParquetRecordReader.initialize(ParquetRecordReader.java:140) at org.apache.spark.rdd.SqlNewHadoopRDD$$anon$1.<init>(SqlNewHadoopRDD.scala:155) at org.apache.spark.rdd.SqlNewHadoopRDD.compute(SqlNewHadoopRDD.scala:120) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297) at org.apache.spark.rdd.RDD.iterator(RDD.scala:264) at org.apache.spark.rdd.UnionRDD.compute(UnionRDD.scala:87) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297) at org.apache.spark.rdd.RDD.iterator(RDD.scala:264) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297) at org.apache.spark.rdd.RDD.iterator(RDD.scala:264) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297) at org.apache.spark.rdd.RDD.iterator(RDD.scala:264) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297) at org.apache.spark.rdd.RDD.iterator(RDD.scala:264) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) at org.apache.spark.scheduler.Task.run(Task.scala:88) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Driver stacktrace:), Query: select col1 from `table3` where col2 = 2. [SQL State=HY000, DB Errorcode=500051] Execution time: 0.44s 1 statement failed. {noformat}

    Apache's JIRA Issue Tracker | 1 year ago | Dominic Ricard
    org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 7212.0 failed 4 times, most recent failure: Lost task 0.3 in stage 7212.0 (TID 138449, 208.92.52.88): java.lang.IllegalArgumentException: Column [col2] was not found in schema!
  3. 0

    GitHub comment 375#176373964

    GitHub | 10 months ago | ankushreddy
    org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 6, istb1-l2-b12-09.hadoop.priv): org.apache.parquet.io.InvalidRecordException: Parquet/Avro schema mismatch. Avro field 'recordGroupPredictedMedianInsertSize' not found.
  4. Speed up your debug routine!

    Automated exception search integrated into your IDE

  5. 0

    [SPARK-6554] Cannot use partition columns in where clause when Parquet filter push-down is enabled - ASF JIRA

    apache.org | 12 months ago
    org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 1 times, most recent failure: Lost task 0.0 in stage 0.0 (TID 0, localhost): java.lang.IllegalArgumentException: Column [probeTypeId] was not found in schema!
  6. 0

    NullPointerException during connection creation.

    GitHub | 5 months ago | AbhiMadav
    org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 3, cdh52.vm.com): java.lang.NullPointerException

    Not finding the right solution?
    Take a tour to get the most out of Samebug.

    Tired of useless tips?

    Automated exception search integrated into your IDE

    Root Cause Analysis

    1. org.apache.spark.SparkException

      Job aborted due to stage failure: Task 0 in stage 7212.0 failed 4 times, most recent failure: Lost task 0.3 in stage 7212.0 (TID 138449, 208.92.52.88): java.lang.IllegalArgumentException: Column [col2] was not found in schema!

      at org.apache.parquet.Preconditions.checkArgument()
    2. org.apache.parquet
      ParquetRecordReader.initialize
      1. org.apache.parquet.Preconditions.checkArgument(Preconditions.java:55)
      2. org.apache.parquet.filter2.predicate.SchemaCompatibilityValidator.getColumnDescriptor(SchemaCompatibilityValidator.java:190)
      3. org.apache.parquet.filter2.predicate.SchemaCompatibilityValidator.validateColumn(SchemaCompatibilityValidator.java:178)
      4. org.apache.parquet.filter2.predicate.SchemaCompatibilityValidator.validateColumnFilterPredicate(SchemaCompatibilityValidator.java:160)
      5. org.apache.parquet.filter2.predicate.SchemaCompatibilityValidator.visit(SchemaCompatibilityValidator.java:94)
      6. org.apache.parquet.filter2.predicate.SchemaCompatibilityValidator.visit(SchemaCompatibilityValidator.java:59)
      7. org.apache.parquet.filter2.predicate.Operators$Eq.accept(Operators.java:180)
      8. org.apache.parquet.filter2.predicate.SchemaCompatibilityValidator.validate(SchemaCompatibilityValidator.java:64)
      9. org.apache.parquet.filter2.compat.RowGroupFilter.visit(RowGroupFilter.java:59)
      10. org.apache.parquet.filter2.compat.RowGroupFilter.visit(RowGroupFilter.java:40)
      11. org.apache.parquet.filter2.compat.FilterCompat$FilterPredicateCompat.accept(FilterCompat.java:126)
      12. org.apache.parquet.filter2.compat.RowGroupFilter.filterRowGroups(RowGroupFilter.java:46)
      13. org.apache.parquet.hadoop.ParquetRecordReader.initializeInternalReader(ParquetRecordReader.java:160)
      14. org.apache.parquet.hadoop.ParquetRecordReader.initialize(ParquetRecordReader.java:140)
      14 frames
    3. Spark
      Executor$TaskRunner.run
      1. org.apache.spark.rdd.SqlNewHadoopRDD$$anon$1.<init>(SqlNewHadoopRDD.scala:155)
      2. org.apache.spark.rdd.SqlNewHadoopRDD.compute(SqlNewHadoopRDD.scala:120)
      3. org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297)
      4. org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
      5. org.apache.spark.rdd.UnionRDD.compute(UnionRDD.scala:87)
      6. org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297)
      7. org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
      8. org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
      9. org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297)
      10. org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
      11. org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
      12. org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297)
      13. org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
      14. org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
      15. org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297)
      16. org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
      17. org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
      18. org.apache.spark.scheduler.Task.run(Task.scala:88)
      19. org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
      19 frames
    4. Java RT
      Thread.run
      1. java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
      2. java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
      3. java.lang.Thread.run(Thread.java:745)
      3 frames