org.apache.spark.sql.AnalysisException

There are no available Samebug tips for this exception. Do you have an idea how to solve this issue? A short tip would help users who saw this issue last week.

  • According to the [Hive Language Manual|https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Union] for UNION ALL: {quote} The number and names of columns returned by each select_statement have to be the same. Otherwise, a schema error is thrown. {quote} Spark SQL silently swallows an error when the tables being joined with UNION ALL have the same number of columns but different names. Reproducible example: {code} // This test is meant to run in spark-shell import java.io.File import java.io.PrintWriter import org.apache.spark.sql.hive.HiveContext import org.apache.spark.sql.SaveMode val ctx = sqlContext.asInstanceOf[HiveContext] import ctx.implicits._ def dataPath(name:String) = sys.env("HOME") + "/" + name + ".jsonlines" def tempTable(name: String, json: String) = { val path = dataPath(name) new PrintWriter(path) { write(json); close } ctx.read.json("file://" + path).registerTempTable(name) } // Note category vs. cat names of first column tempTable("test_one", """{"category" : "A", "num" : 5}""") tempTable("test_another", """{"cat" : "A", "num" : 5}""") // +--------+---+ // |category|num| // +--------+---+ // | A| 5| // | A| 5| // +--------+---+ // // Instead, an error should have been generated due to incompatible schema ctx.sql("select * from test_one union all select * from test_another").show // Cleanup new File(dataPath("test_one")).delete() new File(dataPath("test_another")).delete() {code} When the number of columns is different, Spark can even mix in datatypes. Reproducible example (requires a new spark-shell session): {code} // This test is meant to run in spark-shell import java.io.File import java.io.PrintWriter import org.apache.spark.sql.hive.HiveContext import org.apache.spark.sql.SaveMode val ctx = sqlContext.asInstanceOf[HiveContext] import ctx.implicits._ def dataPath(name:String) = sys.env("HOME") + "/" + name + ".jsonlines" def tempTable(name: String, json: String) = { val path = dataPath(name) new PrintWriter(path) { write(json); close } ctx.read.json("file://" + path).registerTempTable(name) } // Note test_another is missing category column tempTable("test_one", """{"category" : "A", "num" : 5}""") tempTable("test_another", """{"num" : 5}""") // +--------+ // |category| // +--------+ // | A| // | 5| // +--------+ // // Instead, an error should have been generated due to incompatible schema ctx.sql("select * from test_one union all select * from test_another").show // Cleanup new File(dataPath("test_one")).delete() new File(dataPath("test_another")).delete() {code} At other times, when the schema are complex, Spark SQL produces a misleading error about an unresolved Union operator: {code} scala> ctx.sql("""select * from view_clicks | union all | select * from view_clicks_aug | """) 15/08/11 02:40:25 INFO ParseDriver: Parsing command: select * from view_clicks union all select * from view_clicks_aug 15/08/11 02:40:25 INFO ParseDriver: Parse Completed 15/08/11 02:40:25 INFO HiveMetaStore: 0: get_table : db=default tbl=view_clicks 15/08/11 02:40:25 INFO audit: ugi=ubuntu ip=unknown-ip-addr cmd=get_table : db=default tbl=view_clicks 15/08/11 02:40:25 INFO HiveMetaStore: 0: get_table : db=default tbl=view_clicks 15/08/11 02:40:25 INFO audit: ugi=ubuntu ip=unknown-ip-addr cmd=get_table : db=default tbl=view_clicks 15/08/11 02:40:25 INFO HiveMetaStore: 0: get_table : db=default tbl=view_clicks_aug 15/08/11 02:40:25 INFO audit: ugi=ubuntu ip=unknown-ip-addr cmd=get_table : db=default tbl=view_clicks_aug 15/08/11 02:40:25 INFO HiveMetaStore: 0: get_table : db=default tbl=view_clicks_aug 15/08/11 02:40:25 INFO audit: ugi=ubuntu ip=unknown-ip-addr cmd=get_table : db=default tbl=view_clicks_aug org.apache.spark.sql.AnalysisException: unresolved operator 'Union; at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.failAnalysis(CheckAnalysis.scala:38) at org.apache.spark.sql.catalyst.analysis.Analyzer.failAnalysis(Analyzer.scala:42) at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:126) at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:50) at org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:98) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$foreachUp$1.apply(TreeNode.scala:97) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$foreachUp$1.apply(TreeNode.scala:97) at scala.collection.immutable.List.foreach(List.scala:318) at org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:97) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$foreachUp$1.apply(TreeNode.scala:97) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$foreachUp$1.apply(TreeNode.scala:97) at scala.collection.immutable.List.foreach(List.scala:318) at org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:97) at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.checkAnalysis(CheckAnalysis.scala:50) at org.apache.spark.sql.catalyst.analysis.Analyzer.checkAnalysis(Analyzer.scala:42) at org.apache.spark.sql.SQLContext$QueryExecution.assertAnalyzed(SQLContext.scala:931) at org.apache.spark.sql.DataFrame.<init>(DataFrame.scala:131) at org.apache.spark.sql.DataFrame$.apply(DataFrame.scala:51) at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:755){code}
    via by Simeon Simeonov,
  • Add custom field to Spark ML LabeldPoint
    via Stack Overflow by Jihun No
    ,
  • GitHub comment 624#173841576
    via GitHub by nsphung
    ,
    • org.apache.spark.sql.AnalysisException: Table not found: `dev`.`emp`; line 1 pos 18 at org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42) at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:54) at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:50) at org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:121) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$foreachUp$1.apply(TreeNode.scala:120) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$foreachUp$1.apply(TreeNode.scala:120) at scala.collection.immutable.List.foreach(List.scala:318) at org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:120) at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.checkAnalysis(CheckAnalysis.scala:50) at org.apache.spark.sql.catalyst.analysis.Analyzer.checkAnalysis(Analyzer.scala:44) at org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:34) at org.apache.spark.sql.DataFrame.<init>(DataFrame.scala:133) at org.apache.spark.sql.DataFrame$.apply(DataFrame.scala:52) at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:817) at com.impetus.idw.data.connector.SparkHiveToHdfs.main(SparkHiveToHdfs.java:30) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:559)

    Users with the same issue

    Unknown visitor1 times, last one,
    tyson925
    1 times, last one,
    bandoca
    4 times, last one,
    rp
    2 times, last one,
    Unknown visitor1 times, last one,
    7 more bugmates