org.apache.spark.SparkException: Task failed while writing rows.

Apache's JIRA Issue Tracker | Erik Selin | 1 year ago
tip
Your exception is missing from the Samebug knowledge base.
Here are the best solutions we found on the Internet.
Click on the to mark the helpful solution and get rewards for you help.
  1. 0

    When running a large spark sql query including multiple joins I see tasks failing with the following trace: {code} java.lang.NegativeArraySizeException at org.apache.spark.sql.catalyst.expressions.codegen.BufferHolder.grow(BufferHolder.java:36) at org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter.write(UnsafeRowWriter.java:188) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown Source) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown Source) at org.apache.spark.sql.execution.joins.OneSideOuterIterator.getRow(SortMergeOuterJoin.scala:288) at org.apache.spark.sql.execution.RowIteratorToScala.next(RowIterator.scala:76) at org.apache.spark.sql.execution.RowIteratorToScala.next(RowIterator.scala:62) at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) at org.apache.spark.shuffle.sort.UnsafeShuffleWriter.write(UnsafeShuffleWriter.java:164) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) at org.apache.spark.scheduler.Task.run(Task.scala:88) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) {code} From the spark code it looks like this is due to a integer overflow when growing a buffer length. The offending line {{BufferHolder.java:36}} is the following in the version I'm running: {code} final byte[] tmp = new byte[length * 2]; {code} This seems to indicate to me that this buffer will never be able to hold more then 2G worth of data. And likely will hold even less since any length > 1073741824 will cause a integer overflow and turn the new buffer size negative. I hope I'm simply missing some critical config setting but it still seems weird that we have a (rather low) upper limit on these buffers.

    Apache's JIRA Issue Tracker | 1 year ago | Erik Selin
    org.apache.spark.SparkException: Task failed while writing rows.

    Root Cause Analysis

    1. java.lang.NegativeArraySizeException

      No message provided

      at org.apache.spark.sql.catalyst.expressions.codegen.BufferHolder.grow()
    2. Spark Project Catalyst
      GeneratedClass$SpecificUnsafeProjection.apply
      1. org.apache.spark.sql.catalyst.expressions.codegen.BufferHolder.grow(BufferHolder.java:45)
      2. org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter.write(UnsafeRowWriter.java:196)
      3. org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown Source)
      3 frames
    3. org.apache.spark
      InsertIntoHadoopFsRelation$$anonfun$run$1$$anonfun$apply$mcV$sp$3.apply
      1. org.apache.spark.sql.execution.datasources.DynamicPartitionWriterContainer.writeRows(WriterContainer.scala:360)
      2. org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelation$$anonfun$run$1$$anonfun$apply$mcV$sp$3.apply(InsertIntoHadoopFsRelation.scala:150)
      3. org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelation$$anonfun$run$1$$anonfun$apply$mcV$sp$3.apply(InsertIntoHadoopFsRelation.scala:150)
      3 frames
    4. Spark
      Executor$TaskRunner.run
      1. org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
      2. org.apache.spark.scheduler.Task.run(Task.scala:88)
      3. org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:209)
      3 frames
    5. Java RT
      Thread.run
      1. java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
      2. java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
      3. java.lang.Thread.run(Thread.java:745)
      3 frames