org.apache.spark.sql.AnalysisException: Duplicate column(s) : "Int8", "String" found, cannot save to parquet format;

Stack Overflow | newbie_learner | 5 months ago
tip
Do you know that we can give you better hits? Get more relevant results from Samebug’s stack trace search.
  1. 0

    Duplicate column exception when reading Parquet files from S3A using Spark

    Stack Overflow | 5 months ago | newbie_learner
    org.apache.spark.sql.AnalysisException: Duplicate column(s) : "Int8", "String" found, cannot save to parquet format;

    Root Cause Analysis

    1. org.apache.spark.sql.AnalysisException

      Duplicate column(s) : "Int8", "String" found, cannot save to parquet format;

      at org.apache.spark.sql.execution.datasources.parquet.ParquetRelation.checkConstraints()
    2. org.apache.spark
      ParquetRelation.dataSchema
      1. org.apache.spark.sql.execution.datasources.parquet.ParquetRelation.checkConstraints(ParquetRelation.scala:190)
      2. org.apache.spark.sql.execution.datasources.parquet.ParquetRelation.dataSchema(ParquetRelation.scala:199)
      2 frames
    3. Spark Project SQL
      HadoopFsRelation.schema
      1. org.apache.spark.sql.sources.HadoopFsRelation.schema$lzycompute(interfaces.scala:561)
      2. org.apache.spark.sql.sources.HadoopFsRelation.schema(interfaces.scala:560)
      2 frames
    4. org.apache.spark
      LogicalRelation.<init>
      1. org.apache.spark.sql.execution.datasources.LogicalRelation.<init>(LogicalRelation.scala:37)
      1 frame
    5. Spark Project SQL
      SQLContext.parquetFile
      1. org.apache.spark.sql.SQLContext.baseRelationToDataFrame(SQLContext.scala:395)
      2. org.apache.spark.sql.DataFrameReader.parquet(DataFrameReader.scala:267)
      3. org.apache.spark.sql.SQLContext.parquetFile(SQLContext.scala:1052)
      3 frames