org.apache.spark.sql.AnalysisException: Queries with streaming sources must be executed with writeStream.start();

GitHub | kevinushey | 4 months ago
  1. 0

    allow 'tbl_spark's to be constructed from streaming Spark DataFrames

    GitHub | 4 months ago | kevinushey
    org.apache.spark.sql.AnalysisException: Queries with streaming sources must be executed with writeStream.start();
  2. 0

    According to the [Hive Language Manual|https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Union] for UNION ALL: {quote} The number and names of columns returned by each select_statement have to be the same. Otherwise, a schema error is thrown. {quote} Spark SQL silently swallows an error when the tables being joined with UNION ALL have the same number of columns but different names. Reproducible example: {code} // This test is meant to run in spark-shell import java.io.File import java.io.PrintWriter import org.apache.spark.sql.hive.HiveContext import org.apache.spark.sql.SaveMode val ctx = sqlContext.asInstanceOf[HiveContext] import ctx.implicits._ def dataPath(name:String) = sys.env("HOME") + "/" + name + ".jsonlines" def tempTable(name: String, json: String) = { val path = dataPath(name) new PrintWriter(path) { write(json); close } ctx.read.json("file://" + path).registerTempTable(name) } // Note category vs. cat names of first column tempTable("test_one", """{"category" : "A", "num" : 5}""") tempTable("test_another", """{"cat" : "A", "num" : 5}""") // +--------+---+ // |category|num| // +--------+---+ // | A| 5| // | A| 5| // +--------+---+ // // Instead, an error should have been generated due to incompatible schema ctx.sql("select * from test_one union all select * from test_another").show // Cleanup new File(dataPath("test_one")).delete() new File(dataPath("test_another")).delete() {code} When the number of columns is different, Spark can even mix in datatypes. Reproducible example (requires a new spark-shell session): {code} // This test is meant to run in spark-shell import java.io.File import java.io.PrintWriter import org.apache.spark.sql.hive.HiveContext import org.apache.spark.sql.SaveMode val ctx = sqlContext.asInstanceOf[HiveContext] import ctx.implicits._ def dataPath(name:String) = sys.env("HOME") + "/" + name + ".jsonlines" def tempTable(name: String, json: String) = { val path = dataPath(name) new PrintWriter(path) { write(json); close } ctx.read.json("file://" + path).registerTempTable(name) } // Note test_another is missing category column tempTable("test_one", """{"category" : "A", "num" : 5}""") tempTable("test_another", """{"num" : 5}""") // +--------+ // |category| // +--------+ // | A| // | 5| // +--------+ // // Instead, an error should have been generated due to incompatible schema ctx.sql("select * from test_one union all select * from test_another").show // Cleanup new File(dataPath("test_one")).delete() new File(dataPath("test_another")).delete() {code} At other times, when the schema are complex, Spark SQL produces a misleading error about an unresolved Union operator: {code} scala> ctx.sql("""select * from view_clicks | union all | select * from view_clicks_aug | """) 15/08/11 02:40:25 INFO ParseDriver: Parsing command: select * from view_clicks union all select * from view_clicks_aug 15/08/11 02:40:25 INFO ParseDriver: Parse Completed 15/08/11 02:40:25 INFO HiveMetaStore: 0: get_table : db=default tbl=view_clicks 15/08/11 02:40:25 INFO audit: ugi=ubuntu ip=unknown-ip-addr cmd=get_table : db=default tbl=view_clicks 15/08/11 02:40:25 INFO HiveMetaStore: 0: get_table : db=default tbl=view_clicks 15/08/11 02:40:25 INFO audit: ugi=ubuntu ip=unknown-ip-addr cmd=get_table : db=default tbl=view_clicks 15/08/11 02:40:25 INFO HiveMetaStore: 0: get_table : db=default tbl=view_clicks_aug 15/08/11 02:40:25 INFO audit: ugi=ubuntu ip=unknown-ip-addr cmd=get_table : db=default tbl=view_clicks_aug 15/08/11 02:40:25 INFO HiveMetaStore: 0: get_table : db=default tbl=view_clicks_aug 15/08/11 02:40:25 INFO audit: ugi=ubuntu ip=unknown-ip-addr cmd=get_table : db=default tbl=view_clicks_aug org.apache.spark.sql.AnalysisException: unresolved operator 'Union; at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.failAnalysis(CheckAnalysis.scala:38) at org.apache.spark.sql.catalyst.analysis.Analyzer.failAnalysis(Analyzer.scala:42) at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:126) at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:50) at org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:98) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$foreachUp$1.apply(TreeNode.scala:97) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$foreachUp$1.apply(TreeNode.scala:97) at scala.collection.immutable.List.foreach(List.scala:318) at org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:97) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$foreachUp$1.apply(TreeNode.scala:97) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$foreachUp$1.apply(TreeNode.scala:97) at scala.collection.immutable.List.foreach(List.scala:318) at org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:97) at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.checkAnalysis(CheckAnalysis.scala:50) at org.apache.spark.sql.catalyst.analysis.Analyzer.checkAnalysis(Analyzer.scala:42) at org.apache.spark.sql.SQLContext$QueryExecution.assertAnalyzed(SQLContext.scala:931) at org.apache.spark.sql.DataFrame.<init>(DataFrame.scala:131) at org.apache.spark.sql.DataFrame$.apply(DataFrame.scala:51) at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:755){code}

    Apache's JIRA Issue Tracker | 1 year ago | Simeon Simeonov
    org.apache.spark.sql.AnalysisException: unresolved operator 'Union;
  3. Speed up your debug routine!

    Automated exception search integrated into your IDE

  4. 0

    Having count(distinct) not working with hivecontext query in spark 1.6

    Stack Overflow | 1 month ago | Yash_spark
    org.apache.spark.sql.AnalysisException: resolved attribute(s) gid#687,z#688 missing from x#685,y#252,z#255 in operator !Aggregate [x#685,y#252], [cast(((count(if ((gid#687 = 1)) z#688 else null),mode=Complete,isDistinct=false) > cast(1 as bigint)) as boolean) AS havingCondition#686,x#685,y#252];
  5. 0

    Motif find in GraphFrames gives me org.apache.spark.sql.AnalysisException

    Stack Overflow | 3 weeks ago | Michael Qiu
    org.apache.spark.sql.AnalysisException: Except can only be performed on tables with the same number of columns, but the left table has 5 columns and the right has 6;

    1 unregistered visitors
    Not finding the right solution?
    Take a tour to get the most out of Samebug.

    Tired of useless tips?

    Automated exception search integrated into your IDE

    Root Cause Analysis

    1. org.apache.spark.sql.AnalysisException

      Queries with streaming sources must be executed with writeStream.start();

      at org.apache.spark.sql.catalyst.analysis.UnsupportedOperationChecker$.org$apache$spark$sql$catalyst$analysis$UnsupportedOperationChecker$$throwError()
    2. Spark Project Catalyst
      TreeNode$$anonfun$foreachUp$1.apply
      1. org.apache.spark.sql.catalyst.analysis.UnsupportedOperationChecker$.org$apache$spark$sql$catalyst$analysis$UnsupportedOperationChecker$$throwError(UnsupportedOperationChecker.scala:173)
      2. org.apache.spark.sql.catalyst.analysis.UnsupportedOperationChecker$$anonfun$checkForBatch$1.apply(UnsupportedOperationChecker.scala:33)
      3. org.apache.spark.sql.catalyst.analysis.UnsupportedOperationChecker$$anonfun$checkForBatch$1.apply(UnsupportedOperationChecker.scala:31)
      4. org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:126)
      5. org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$foreachUp$1.apply(TreeNode.scala:125)
      6. org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$foreachUp$1.apply(TreeNode.scala:125)
      6 frames
    3. Scala
      List.foreach
      1. scala.collection.immutable.List.foreach(List.scala:381)
      1 frame