org.apache.spark.sql.AnalysisException: Table not found: `dev`.`emp`; line 1 pos 18

Stack Overflow | dev ツ | 2 weeks ago
  1. 0

    According to the [Hive Language Manual|https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Union] for UNION ALL: {quote} The number and names of columns returned by each select_statement have to be the same. Otherwise, a schema error is thrown. {quote} Spark SQL silently swallows an error when the tables being joined with UNION ALL have the same number of columns but different names. Reproducible example: {code} // This test is meant to run in spark-shell import java.io.File import java.io.PrintWriter import org.apache.spark.sql.hive.HiveContext import org.apache.spark.sql.SaveMode val ctx = sqlContext.asInstanceOf[HiveContext] import ctx.implicits._ def dataPath(name:String) = sys.env("HOME") + "/" + name + ".jsonlines" def tempTable(name: String, json: String) = { val path = dataPath(name) new PrintWriter(path) { write(json); close } ctx.read.json("file://" + path).registerTempTable(name) } // Note category vs. cat names of first column tempTable("test_one", """{"category" : "A", "num" : 5}""") tempTable("test_another", """{"cat" : "A", "num" : 5}""") // +--------+---+ // |category|num| // +--------+---+ // | A| 5| // | A| 5| // +--------+---+ // // Instead, an error should have been generated due to incompatible schema ctx.sql("select * from test_one union all select * from test_another").show // Cleanup new File(dataPath("test_one")).delete() new File(dataPath("test_another")).delete() {code} When the number of columns is different, Spark can even mix in datatypes. Reproducible example (requires a new spark-shell session): {code} // This test is meant to run in spark-shell import java.io.File import java.io.PrintWriter import org.apache.spark.sql.hive.HiveContext import org.apache.spark.sql.SaveMode val ctx = sqlContext.asInstanceOf[HiveContext] import ctx.implicits._ def dataPath(name:String) = sys.env("HOME") + "/" + name + ".jsonlines" def tempTable(name: String, json: String) = { val path = dataPath(name) new PrintWriter(path) { write(json); close } ctx.read.json("file://" + path).registerTempTable(name) } // Note test_another is missing category column tempTable("test_one", """{"category" : "A", "num" : 5}""") tempTable("test_another", """{"num" : 5}""") // +--------+ // |category| // +--------+ // | A| // | 5| // +--------+ // // Instead, an error should have been generated due to incompatible schema ctx.sql("select * from test_one union all select * from test_another").show // Cleanup new File(dataPath("test_one")).delete() new File(dataPath("test_another")).delete() {code} At other times, when the schema are complex, Spark SQL produces a misleading error about an unresolved Union operator: {code} scala> ctx.sql("""select * from view_clicks | union all | select * from view_clicks_aug | """) 15/08/11 02:40:25 INFO ParseDriver: Parsing command: select * from view_clicks union all select * from view_clicks_aug 15/08/11 02:40:25 INFO ParseDriver: Parse Completed 15/08/11 02:40:25 INFO HiveMetaStore: 0: get_table : db=default tbl=view_clicks 15/08/11 02:40:25 INFO audit: ugi=ubuntu ip=unknown-ip-addr cmd=get_table : db=default tbl=view_clicks 15/08/11 02:40:25 INFO HiveMetaStore: 0: get_table : db=default tbl=view_clicks 15/08/11 02:40:25 INFO audit: ugi=ubuntu ip=unknown-ip-addr cmd=get_table : db=default tbl=view_clicks 15/08/11 02:40:25 INFO HiveMetaStore: 0: get_table : db=default tbl=view_clicks_aug 15/08/11 02:40:25 INFO audit: ugi=ubuntu ip=unknown-ip-addr cmd=get_table : db=default tbl=view_clicks_aug 15/08/11 02:40:25 INFO HiveMetaStore: 0: get_table : db=default tbl=view_clicks_aug 15/08/11 02:40:25 INFO audit: ugi=ubuntu ip=unknown-ip-addr cmd=get_table : db=default tbl=view_clicks_aug org.apache.spark.sql.AnalysisException: unresolved operator 'Union; at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.failAnalysis(CheckAnalysis.scala:38) at org.apache.spark.sql.catalyst.analysis.Analyzer.failAnalysis(Analyzer.scala:42) at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:126) at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:50) at org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:98) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$foreachUp$1.apply(TreeNode.scala:97) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$foreachUp$1.apply(TreeNode.scala:97) at scala.collection.immutable.List.foreach(List.scala:318) at org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:97) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$foreachUp$1.apply(TreeNode.scala:97) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$foreachUp$1.apply(TreeNode.scala:97) at scala.collection.immutable.List.foreach(List.scala:318) at org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:97) at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.checkAnalysis(CheckAnalysis.scala:50) at org.apache.spark.sql.catalyst.analysis.Analyzer.checkAnalysis(Analyzer.scala:42) at org.apache.spark.sql.SQLContext$QueryExecution.assertAnalyzed(SQLContext.scala:931) at org.apache.spark.sql.DataFrame.<init>(DataFrame.scala:131) at org.apache.spark.sql.DataFrame$.apply(DataFrame.scala:51) at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:755){code}

    Apache's JIRA Issue Tracker | 1 year ago | Simeon Simeonov
    org.apache.spark.sql.AnalysisException: unresolved operator 'Union;
  2. 0

    Spark 1.5.2: org.apache.spark.sql.AnalysisException: unresolved operator 'Union;

    Stack Overflow | 6 months ago | Neel
    org.apache.spark.sql.AnalysisException: unresolved operator 'Union;
  3. 0

    Table not found while creating dataframe from Hive Table

    Stack Overflow | 2 weeks ago | dev ツ
    org.apache.spark.sql.AnalysisException: Table not found: `dev`.`emp`; line 1 pos 18
  4. Speed up your debug routine!

    Automated exception search integrated into your IDE

  5. 0

    cannot resolve xyz given input columns error when creating Spark dataset

    Stack Overflow | 3 months ago | fatdragon
    org.apache.spark.sql.AnalysisException: cannot resolve '`sepalWidth`' given input columns: [_c1, _c3, _c0, _c4, _c2];

    1 unregistered visitors
    Not finding the right solution?
    Take a tour to get the most out of Samebug.

    Tired of useless tips?

    Automated exception search integrated into your IDE

    Root Cause Analysis

    1. org.apache.spark.sql.AnalysisException

      Table not found: `dev`.`emp`; line 1 pos 18

      at org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis()
    2. Spark Project Catalyst
      TreeNode$$anonfun$foreachUp$1.apply
      1. org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42)
      2. org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:54)
      3. org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:50)
      4. org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:121)
      5. org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$foreachUp$1.apply(TreeNode.scala:120)
      6. org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$foreachUp$1.apply(TreeNode.scala:120)
      6 frames
    3. Scala
      List.foreach
      1. scala.collection.immutable.List.foreach(List.scala:318)
      1 frame
    4. Spark Project Catalyst
      Analyzer.checkAnalysis
      1. org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:120)
      2. org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.checkAnalysis(CheckAnalysis.scala:50)
      3. org.apache.spark.sql.catalyst.analysis.Analyzer.checkAnalysis(Analyzer.scala:44)
      3 frames
    5. Spark Project SQL
      SQLContext.sql
      1. org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:34)
      2. org.apache.spark.sql.DataFrame.<init>(DataFrame.scala:133)
      3. org.apache.spark.sql.DataFrame$.apply(DataFrame.scala:52)
      4. org.apache.spark.sql.SQLContext.sql(SQLContext.scala:817)
      4 frames
    6. com.impetus.idw
      SparkHiveToHdfs.main
      1. com.impetus.idw.data.connector.SparkHiveToHdfs.main(SparkHiveToHdfs.java:30)
      1 frame
    7. Java RT
      Method.invoke
      1. sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      2. sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
      3. sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
      4. java.lang.reflect.Method.invoke(Method.java:497)
      4 frames
    8. Spark Project YARN Stable API
      ApplicationMaster$$anon$2.run
      1. org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:559)
      1 frame