org.apache.spark.sql.AnalysisException: Duplicate column(s) : "Int8", "String" found, cannot save to parquet format;

Stack Overflow | newbie_learner | 6 months ago
tip
Your exception is missing from the Samebug knowledge base.
Here are the best solutions we found on the Internet.
Click on the to mark the helpful solution and get rewards for you help.
  1. 0

    Duplicate column exception when reading Parquet files from S3A using Spark

    Stack Overflow | 6 months ago | newbie_learner
    org.apache.spark.sql.AnalysisException: Duplicate column(s) : "Int8", "String" found, cannot save to parquet format;

    Root Cause Analysis

    1. org.apache.spark.sql.AnalysisException

      Duplicate column(s) : "Int8", "String" found, cannot save to parquet format;

      at org.apache.spark.sql.execution.datasources.parquet.ParquetRelation.checkConstraints()
    2. org.apache.spark
      ParquetRelation.dataSchema
      1. org.apache.spark.sql.execution.datasources.parquet.ParquetRelation.checkConstraints(ParquetRelation.scala:190)
      2. org.apache.spark.sql.execution.datasources.parquet.ParquetRelation.dataSchema(ParquetRelation.scala:199)
      2 frames
    3. Spark Project SQL
      HadoopFsRelation.schema
      1. org.apache.spark.sql.sources.HadoopFsRelation.schema$lzycompute(interfaces.scala:561)
      2. org.apache.spark.sql.sources.HadoopFsRelation.schema(interfaces.scala:560)
      2 frames
    4. org.apache.spark
      LogicalRelation.<init>
      1. org.apache.spark.sql.execution.datasources.LogicalRelation.<init>(LogicalRelation.scala:37)
      1 frame
    5. Spark Project SQL
      SQLContext.parquetFile
      1. org.apache.spark.sql.SQLContext.baseRelationToDataFrame(SQLContext.scala:395)
      2. org.apache.spark.sql.DataFrameReader.parquet(DataFrameReader.scala:267)
      3. org.apache.spark.sql.SQLContext.parquetFile(SQLContext.scala:1052)
      3 frames