java.lang.ClassCastException: java.lang.String cannot be cast to [B

JIRA | Zongheng Yang | 2 years ago
tip
Your exception is missing from the Samebug knowledge base.
Here are the best solutions we found on the Internet.
Click on the to mark the helpful solution and get rewards for you help.
  1. 0

    Minimal example: {noformat} rdd <- textFile(sc, "./README.md") lengths <- lapply(rdd, function(x) { length(x) }) take(lengths, 5) # works lengths10 <- lapply(lengths, function(x) { x + 10}) take(lengths10, 2) # breaks {noformat} Stacktrace: {noformat} Exception in thread "stdin writer for R" java.lang.ClassCastException: java.lang.String cannot be cast to [B at edu.berkeley.cs.amplab.sparkr.RRDD$$anon$4$$anonfun$run$3.apply(RRDD.scala:312) at edu.berkeley.cs.amplab.sparkr.RRDD$$anon$4$$anonfun$run$3.apply(RRDD.scala:310) at scala.collection.Iterator$class.foreach(Iterator.scala:727) at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) at edu.berkeley.cs.amplab.sparkr.RRDD$$anon$4.run(RRDD.scala:310) Error in readBin(con, raw(), as.integer(dataLen), endian = "big") : invalid 'n' argument Calls: unserialize -> readRawLen -> readBin Execution halted 14/11/17 12:22:31 ERROR Executor: Exception in task 0.0 in stage 1.0 (TID 1) java.lang.NullPointerException at edu.berkeley.cs.amplab.sparkr.RRDD.compute(RRDD.scala:128) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) at org.apache.spark.rdd.RDD.iterator(RDD.scala:229) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62) at org.apache.spark.scheduler.Task.run(Task.scala:54) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918) at java.lang.Thread.run(Thread.java:695) 14/11/17 12:22:31 WARN TaskSetManager: Lost task 0.0 in stage 1.0 (TID 1, localhost): java.lang.NullPointerException: edu.berkeley.cs.amplab.sparkr.RRDD.compute(RRDD.scala:128) org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) org.apache.spark.rdd.RDD.iterator(RDD.scala:229) org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62) org.apache.spark.scheduler.Task.run(Task.scala:54) org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177) java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895) java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918) java.lang.Thread.run(Thread.java:695) 14/11/17 12:22:31 ERROR TaskSetManager: Task 0 in stage 1.0 failed 1 times; aborting job Error in .jcall(jrdd, "[Ljava/util/List;", "collectPartitions", .jarray(as.integer(index))) : org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 1.0 failed 1 times, most recent failure: Lost task 0.0 in stage 1.0 (TID 1, localhost): java.lang.NullPointerException: edu.berkeley.cs.amplab.sparkr.RRDD.compute(RRDD.scala:128) org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) org.apache.spark.rdd.RDD.iterator(RDD.scala:229) org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62) org.apache.spark.scheduler.Task.run(Task.scala:54) org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177) java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895) java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918) java.lang.Thread.run(Thread.java:695) {noformat} This is likely related to [this line|https://github.com/amplab-extras/SparkR-pkg/blob/master/pkg/R/RDD.R#L122], changing which to FALSE seems to eliminate the issue. One workaround is to cache the `lengths` RDD first. We should figure out what exactly the issue is & perhaps in the mean time add some more doc in the code on how pipelining works (e.g. state invariants on some key variables).

    JIRA | 2 years ago | Zongheng Yang
    java.lang.ClassCastException: java.lang.String cannot be cast to [B
  2. 0

    Minimal example: {noformat} rdd <- textFile(sc, "./README.md") lengths <- lapply(rdd, function(x) { length(x) }) take(lengths, 5) # works lengths10 <- lapply(lengths, function(x) { x + 10}) take(lengths10, 2) # breaks {noformat} Stacktrace: {noformat} Exception in thread "stdin writer for R" java.lang.ClassCastException: java.lang.String cannot be cast to [B at edu.berkeley.cs.amplab.sparkr.RRDD$$anon$4$$anonfun$run$3.apply(RRDD.scala:312) at edu.berkeley.cs.amplab.sparkr.RRDD$$anon$4$$anonfun$run$3.apply(RRDD.scala:310) at scala.collection.Iterator$class.foreach(Iterator.scala:727) at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) at edu.berkeley.cs.amplab.sparkr.RRDD$$anon$4.run(RRDD.scala:310) Error in readBin(con, raw(), as.integer(dataLen), endian = "big") : invalid 'n' argument Calls: unserialize -> readRawLen -> readBin Execution halted 14/11/17 12:22:31 ERROR Executor: Exception in task 0.0 in stage 1.0 (TID 1) java.lang.NullPointerException at edu.berkeley.cs.amplab.sparkr.RRDD.compute(RRDD.scala:128) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) at org.apache.spark.rdd.RDD.iterator(RDD.scala:229) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62) at org.apache.spark.scheduler.Task.run(Task.scala:54) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918) at java.lang.Thread.run(Thread.java:695) 14/11/17 12:22:31 WARN TaskSetManager: Lost task 0.0 in stage 1.0 (TID 1, localhost): java.lang.NullPointerException: edu.berkeley.cs.amplab.sparkr.RRDD.compute(RRDD.scala:128) org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) org.apache.spark.rdd.RDD.iterator(RDD.scala:229) org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62) org.apache.spark.scheduler.Task.run(Task.scala:54) org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177) java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895) java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918) java.lang.Thread.run(Thread.java:695) 14/11/17 12:22:31 ERROR TaskSetManager: Task 0 in stage 1.0 failed 1 times; aborting job Error in .jcall(jrdd, "[Ljava/util/List;", "collectPartitions", .jarray(as.integer(index))) : org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 1.0 failed 1 times, most recent failure: Lost task 0.0 in stage 1.0 (TID 1, localhost): java.lang.NullPointerException: edu.berkeley.cs.amplab.sparkr.RRDD.compute(RRDD.scala:128) org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) org.apache.spark.rdd.RDD.iterator(RDD.scala:229) org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62) org.apache.spark.scheduler.Task.run(Task.scala:54) org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177) java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895) java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918) java.lang.Thread.run(Thread.java:695) {noformat} This is likely related to [this line|https://github.com/amplab-extras/SparkR-pkg/blob/master/pkg/R/RDD.R#L122], changing which to FALSE seems to eliminate the issue. One workaround is to cache the `lengths` RDD first. We should figure out what exactly the issue is & perhaps in the mean time add some more doc in the code on how pipelining works (e.g. state invariants on some key variables).

    JIRA | 2 years ago | Zongheng Yang
    java.lang.ClassCastException: java.lang.String cannot be cast to [B

    Root Cause Analysis

    1. java.lang.ClassCastException

      java.lang.String cannot be cast to [B

      at edu.berkeley.cs.amplab.sparkr.RRDD$$anon$4$$anonfun$run$3.apply()
    2. edu.berkeley.cs
      RRDD$$anon$4$$anonfun$run$3.apply
      1. edu.berkeley.cs.amplab.sparkr.RRDD$$anon$4$$anonfun$run$3.apply(RRDD.scala:312)
      2. edu.berkeley.cs.amplab.sparkr.RRDD$$anon$4$$anonfun$run$3.apply(RRDD.scala:310)
      2 frames
    3. Scala
      AbstractIterator.foreach
      1. scala.collection.Iterator$class.foreach(Iterator.scala:727)
      2. scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
      2 frames
    4. edu.berkeley.cs
      RRDD$$anon$4.run
      1. edu.berkeley.cs.amplab.sparkr.RRDD$$anon$4.run(RRDD.scala:310)
      1 frame