java.lang.NullPointerException

JIRA | Zongheng Yang | 2 years ago
  1. 0

    Minimal example: {noformat} rdd <- textFile(sc, "./README.md") lengths <- lapply(rdd, function(x) { length(x) }) take(lengths, 5) # works lengths10 <- lapply(lengths, function(x) { x + 10}) take(lengths10, 2) # breaks {noformat} Stacktrace: {noformat} Exception in thread "stdin writer for R" java.lang.ClassCastException: java.lang.String cannot be cast to [B at edu.berkeley.cs.amplab.sparkr.RRDD$$anon$4$$anonfun$run$3.apply(RRDD.scala:312) at edu.berkeley.cs.amplab.sparkr.RRDD$$anon$4$$anonfun$run$3.apply(RRDD.scala:310) at scala.collection.Iterator$class.foreach(Iterator.scala:727) at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) at edu.berkeley.cs.amplab.sparkr.RRDD$$anon$4.run(RRDD.scala:310) Error in readBin(con, raw(), as.integer(dataLen), endian = "big") : invalid 'n' argument Calls: unserialize -> readRawLen -> readBin Execution halted 14/11/17 12:22:31 ERROR Executor: Exception in task 0.0 in stage 1.0 (TID 1) java.lang.NullPointerException at edu.berkeley.cs.amplab.sparkr.RRDD.compute(RRDD.scala:128) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) at org.apache.spark.rdd.RDD.iterator(RDD.scala:229) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62) at org.apache.spark.scheduler.Task.run(Task.scala:54) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918) at java.lang.Thread.run(Thread.java:695) 14/11/17 12:22:31 WARN TaskSetManager: Lost task 0.0 in stage 1.0 (TID 1, localhost): java.lang.NullPointerException: edu.berkeley.cs.amplab.sparkr.RRDD.compute(RRDD.scala:128) org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) org.apache.spark.rdd.RDD.iterator(RDD.scala:229) org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62) org.apache.spark.scheduler.Task.run(Task.scala:54) org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177) java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895) java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918) java.lang.Thread.run(Thread.java:695) 14/11/17 12:22:31 ERROR TaskSetManager: Task 0 in stage 1.0 failed 1 times; aborting job Error in .jcall(jrdd, "[Ljava/util/List;", "collectPartitions", .jarray(as.integer(index))) : org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 1.0 failed 1 times, most recent failure: Lost task 0.0 in stage 1.0 (TID 1, localhost): java.lang.NullPointerException: edu.berkeley.cs.amplab.sparkr.RRDD.compute(RRDD.scala:128) org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) org.apache.spark.rdd.RDD.iterator(RDD.scala:229) org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62) org.apache.spark.scheduler.Task.run(Task.scala:54) org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177) java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895) java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918) java.lang.Thread.run(Thread.java:695) {noformat} This is likely related to [this line|https://github.com/amplab-extras/SparkR-pkg/blob/master/pkg/R/RDD.R#L122], changing which to FALSE seems to eliminate the issue. One workaround is to cache the `lengths` RDD first. We should figure out what exactly the issue is & perhaps in the mean time add some more doc in the code on how pipelining works (e.g. state invariants on some key variables).

    JIRA | 2 years ago | Zongheng Yang
    java.lang.NullPointerException
  2. 0

    Minimal example: {noformat} rdd <- textFile(sc, "./README.md") lengths <- lapply(rdd, function(x) { length(x) }) take(lengths, 5) # works lengths10 <- lapply(lengths, function(x) { x + 10}) take(lengths10, 2) # breaks {noformat} Stacktrace: {noformat} Exception in thread "stdin writer for R" java.lang.ClassCastException: java.lang.String cannot be cast to [B at edu.berkeley.cs.amplab.sparkr.RRDD$$anon$4$$anonfun$run$3.apply(RRDD.scala:312) at edu.berkeley.cs.amplab.sparkr.RRDD$$anon$4$$anonfun$run$3.apply(RRDD.scala:310) at scala.collection.Iterator$class.foreach(Iterator.scala:727) at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) at edu.berkeley.cs.amplab.sparkr.RRDD$$anon$4.run(RRDD.scala:310) Error in readBin(con, raw(), as.integer(dataLen), endian = "big") : invalid 'n' argument Calls: unserialize -> readRawLen -> readBin Execution halted 14/11/17 12:22:31 ERROR Executor: Exception in task 0.0 in stage 1.0 (TID 1) java.lang.NullPointerException at edu.berkeley.cs.amplab.sparkr.RRDD.compute(RRDD.scala:128) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) at org.apache.spark.rdd.RDD.iterator(RDD.scala:229) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62) at org.apache.spark.scheduler.Task.run(Task.scala:54) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918) at java.lang.Thread.run(Thread.java:695) 14/11/17 12:22:31 WARN TaskSetManager: Lost task 0.0 in stage 1.0 (TID 1, localhost): java.lang.NullPointerException: edu.berkeley.cs.amplab.sparkr.RRDD.compute(RRDD.scala:128) org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) org.apache.spark.rdd.RDD.iterator(RDD.scala:229) org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62) org.apache.spark.scheduler.Task.run(Task.scala:54) org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177) java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895) java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918) java.lang.Thread.run(Thread.java:695) 14/11/17 12:22:31 ERROR TaskSetManager: Task 0 in stage 1.0 failed 1 times; aborting job Error in .jcall(jrdd, "[Ljava/util/List;", "collectPartitions", .jarray(as.integer(index))) : org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 1.0 failed 1 times, most recent failure: Lost task 0.0 in stage 1.0 (TID 1, localhost): java.lang.NullPointerException: edu.berkeley.cs.amplab.sparkr.RRDD.compute(RRDD.scala:128) org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) org.apache.spark.rdd.RDD.iterator(RDD.scala:229) org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62) org.apache.spark.scheduler.Task.run(Task.scala:54) org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177) java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895) java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918) java.lang.Thread.run(Thread.java:695) {noformat} This is likely related to [this line|https://github.com/amplab-extras/SparkR-pkg/blob/master/pkg/R/RDD.R#L122], changing which to FALSE seems to eliminate the issue. One workaround is to cache the `lengths` RDD first. We should figure out what exactly the issue is & perhaps in the mean time add some more doc in the code on how pipelining works (e.g. state invariants on some key variables).

    JIRA | 2 years ago | Zongheng Yang
    java.lang.NullPointerException
  3. 0

    How to build logistic regression model in SparkR

    Stack Overflow | 2 years ago
    java.lang.NullPointerException
  4. Speed up your debug routine!

    Automated exception search integrated into your IDE

  5. 0

    Hi, I am having an issue that seems similar to SPARKR-17, except I am able run pi.R just fine but not wordcount.R I have a spark master running and a single worker. All the spark examples work, including JavaWordCount I ran the following command: {{./sparkR examples/wordcount.R spark://[host]:7077 README.md}} (I get the same error with master set as "local" as well as using various methods to identify the file, including the full path and using the protocol "file:///...") h3. Console Output: ### {code} Loading required package: SparkR Loading required package: methods Loading required package: rJava [SparkR] Initializing with classpath /Users/[user]/Projects/SparkR-pkg/lib/SparkR/sparkr-assembly-0.1.jar 14/02/26 08:35:12 INFO Slf4jLogger: Slf4jLogger started 2014-02-26 08:35:13.632 R[14476:d0b] Unable to load realm mapping info from SCDynamicStore 14/02/26 08:35:14 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 14/02/26 08:35:14 WARN LoadSnappy: Snappy native library not loaded 14/02/26 08:35:14 INFO FileInputFormat: Total input paths to process : 1 14/02/26 08:35:21 WARN TaskSetManager: Lost TID 2 (task 0.0:0) 14/02/26 08:35:21 WARN TaskSetManager: Loss was due to java.lang.NullPointerException java.lang.NullPointerException at edu.berkeley.cs.amplab.sparkr.RRDD.compute(RRDD.scala:117) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:241) at org.apache.spark.rdd.RDD.iterator(RDD.scala:232) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:109) at org.apache.spark.scheduler.Task.run(Task.scala:53) at org.apache.spark.executor.Executor$TaskRunner$$anonfun$run$1.apply$mcV$sp(Executor.scala:213) at org.apache.spark.deploy.SparkHadoopUtil.runAsUser(SparkHadoopUtil.scala:49) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:178) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:724) 14/02/26 08:35:22 WARN TaskSetManager: Lost TID 4 (task 0.0:0) 14/02/26 08:35:23 WARN TaskSetManager: Lost TID 5 (task 0.0:0) 14/02/26 08:35:25 WARN TaskSetManager: Lost TID 6 (task 0.0:0) 14/02/26 08:35:25 ERROR TaskSetManager: Task 0.0:0 failed 4 times; aborting job Error in .jcall(getJRDD(rdd), "Ljava/util/List;", "collect") : org.apache.spark.SparkException: Job aborted: Task 0.0:0 failed 4 times (most recent failure: Exception failure: java.lang.NullPointerException) Calls: collect -> collect -> .local -> .jcall -> .jcheck -> .Call Execution halted {code} h3. Worker Log ### {code} Spark Executor Command: "java" "-cp" ":/Users/[user]/Software/spark-0.9.0-incubating-bin-hadoop1/conf:/Users/[user]/Software/spark-0.9.0-incubating-bin-hadoop1/assembly/target/scala-2.10/spark-assembly_2.10-0.9.0-incubating-hadoop1.0.4.jar" "-Xms512M" "-Xmx512M" "org.apache.spark.executor.CoarseGrainedExecutorBackend" "akka.tcp://spark@[host]:56220/user/CoarseGrainedScheduler" "0" "[host]" "12" "akka.tcp://sparkWorker@[host]:58525/user/Worker" "app-20140226083514-0002" ======================================== log4j:WARN No appenders could be found for logger (akka.event.slf4j.Slf4jLogger). log4j:WARN Please initialize the log4j system properly. log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info. 14/02/26 08:35:15 INFO CoarseGrainedExecutorBackend: Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties 14/02/26 08:35:15 INFO CoarseGrainedExecutorBackend: Connecting to driver: akka.tcp://spark@[host]:56220/user/CoarseGrainedScheduler 14/02/26 08:35:15 INFO WorkerWatcher: Connecting to worker akka.tcp://sparkWorker@[host]:58525/user/Worker 14/02/26 08:35:15 INFO WorkerWatcher: Successfully connected to akka.tcp://sparkWorker@[host]:58525/user/Worker 14/02/26 08:35:15 INFO CoarseGrainedExecutorBackend: Successfully registered with driver 14/02/26 08:35:15 INFO Slf4jLogger: Slf4jLogger started 14/02/26 08:35:15 INFO Remoting: Starting remoting 14/02/26 08:35:15 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://spark@[host]:56228] 14/02/26 08:35:15 INFO Remoting: Remoting now listens on addresses: [akka.tcp://spark@[host]:56228] 14/02/26 08:35:15 INFO SparkEnv: Connecting to BlockManagerMaster: akka.tcp://spark@[host]:56220/user/BlockManagerMaster 14/02/26 08:35:16 INFO DiskBlockManager: Created local directory at /var/folders/hv/vgn69gtn7q54v3x9b9hldz2r3gdyms/T/spark-local-20140226083516-742b 14/02/26 08:35:16 INFO MemoryStore: MemoryStore started with capacity 294.9 MB. 14/02/26 08:35:16 INFO ConnectionManager: Bound socket to port 56230 with id = ConnectionManagerId([host],56230) 14/02/26 08:35:16 INFO BlockManagerMaster: Trying to register BlockManager 14/02/26 08:35:16 INFO BlockManagerMaster: Registered BlockManager 14/02/26 08:35:16 INFO SparkEnv: Connecting to MapOutputTracker: akka.tcp://spark@[host]:56220/user/MapOutputTracker 14/02/26 08:35:16 INFO HttpFileServer: HTTP File server directory is /var/folders/hv/vgn69gtn7q54v3x9b9hldz2r3gdyms/T/spark-f56850c3-b1e4-4e1d-ade8-d3e88ecb740f 14/02/26 08:35:16 INFO HttpServer: Starting HTTP Server 14/02/26 08:35:16 INFO CoarseGrainedExecutorBackend: Got assigned task 0 14/02/26 08:35:16 INFO CoarseGrainedExecutorBackend: Got assigned task 1 2014-02-26 08:35:16.477 java[14486:b803] Unable to load realm mapping info from SCDynamicStore 14/02/26 08:35:16 INFO Executor: Running task ID 0 14/02/26 08:35:16 INFO Executor: Running task ID 1 14/02/26 08:35:16 INFO Executor: Fetching http://130.20.186.189:56223/jars/sparkr-assembly-0.1.jar with timestamp 1393432514043 14/02/26 08:35:16 INFO Utils: Fetching http://130.20.186.189:56223/jars/sparkr-assembly-0.1.jar to /var/folders/hv/vgn69gtn7q54v3x9b9hldz2r3gdyms/T/fetchFileTemp4469264870321020627.tmp 14/02/26 08:35:17 INFO Executor: Adding file:/Users/[user]/Software/spark-0.9.0-incubating-bin-hadoop1/work/app-20140226083514-0002/0/./sparkr-assembly-0.1.jar to class loader 14/02/26 08:35:17 INFO HttpBroadcast: Started reading broadcast variable 0 14/02/26 08:35:17 INFO MemoryStore: ensureFreeSpace(39208) called with curMem=0, maxMem=309225062 14/02/26 08:35:17 INFO MemoryStore: Block broadcast_0 stored as values to memory (estimated size 38.3 KB, free 294.9 MB) 14/02/26 08:35:17 INFO HttpBroadcast: Reading broadcast variable 0 took 0.149631 s 14/02/26 08:35:17 INFO BlockManager: Found block broadcast_0 locally 14/02/26 08:35:17 INFO HadoopRDD: Input split: file:/Users/[user]/Projects/SparkR-pkg/README.md:1547+1547 14/02/26 08:35:17 INFO HadoopRDD: Input split: file:/Users/[user]/Projects/SparkR-pkg/README.md:0+1547 14/02/26 08:35:17 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 14/02/26 08:35:17 WARN LoadSnappy: Snappy native library not loaded 14/02/26 08:35:19 INFO Executor: Serialized size of result for 1 is 751 14/02/26 08:35:19 INFO Executor: Serialized size of result for 0 is 751 14/02/26 08:35:19 INFO Executor: Sending result for 1 directly to driver 14/02/26 08:35:19 INFO Executor: Sending result for 0 directly to driver 14/02/26 08:35:19 INFO Executor: Finished task ID 0 14/02/26 08:35:19 INFO Executor: Finished task ID 1 14/02/26 08:35:19 INFO CoarseGrainedExecutorBackend: Got assigned task 2 14/02/26 08:35:19 INFO Executor: Running task ID 2 14/02/26 08:35:19 INFO CoarseGrainedExecutorBackend: Got assigned task 3 14/02/26 08:35:19 INFO Executor: Running task ID 3 14/02/26 08:35:19 INFO BlockManager: Found block broadcast_0 locally 14/02/26 08:35:19 INFO BlockManager: Found block broadcast_0 locally 14/02/26 08:35:20 INFO MapOutputTracker: Updating epoch to 1 and clearing cache 14/02/26 08:35:20 INFO MapOutputTracker: Don't have map outputs for shuffle 0, fetching them 14/02/26 08:35:20 INFO MapOutputTracker: Don't have map outputs for shuffle 0, fetching them 14/02/26 08:35:20 INFO MapOutputTracker: Doing the fetch; tracker actor = Actor[akka.tcp://spark@[host]:56220/user/MapOutputTracker#-260655231] 14/02/26 08:35:20 INFO MapOutputTracker: Got the output locations 14/02/26 08:35:20 INFO BlockFetcherIterator$BasicBlockFetcherIterator: Getting 2 non-zero-bytes blocks out of 2 blocks 14/02/26 08:35:20 INFO BlockFetcherIterator$BasicBlockFetcherIterator: Getting 0 non-zero-bytes blocks out of 2 blocks 14/02/26 08:35:20 INFO BlockFetcherIterator$BasicBlockFetcherIterator: Started 0 remote gets in 6 ms 14/02/26 08:35:20 INFO BlockFetcherIterator$BasicBlockFetcherIterator: Started 0 remote gets in 6 ms 14/02/26 08:35:21 INFO Executor: Serialized size of result for 3 is 807 14/02/26 08:35:21 INFO Executor: Sending result for 3 directly to driver 14/02/26 08:35:21 INFO Executor: Finished task ID 3 Error in item[[2]] : subscript out of bounds Calls: do.call ... <Anonymous> -> FUN -> FUN -> lapply -> lapply -> FUN Execution halted 14/02/26 08:35:21 ERROR Executor: Exception in task ID 2 java.lang.NullPointerException at edu.berkeley.cs.amplab.sparkr.RRDD.compute(RRDD.scala:117) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:241) at org.apache.spark.rdd.RDD.iterator(RDD.scala:232) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:109) at org.apache.spark.scheduler.Task.run(Task.scala:53) at org.apache.spark.executor.Executor$TaskRunner$$anonfun$run$1.apply$mcV$sp(Executor.scala:213) at org.apache.spark.deploy.SparkHadoopUtil.runAsUser(SparkHadoopUtil.scala:49) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:178) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:724) 14/02/26 08:35:21 INFO CoarseGrainedExecutorBackend: Got assigned task 4 14/02/26 08:35:21 INFO Executor: Running task ID 4 14/02/26 08:35:21 INFO BlockManager: Found block broadcast_0 locally 14/02/26 08:35:21 INFO BlockFetcherIterator$BasicBlockFetcherIterator: Getting 2 non-zero-bytes blocks out of 2 blocks 14/02/26 08:35:21 INFO BlockFetcherIterator$BasicBlockFetcherIterator: Started 0 remote gets in 0 ms Error in item[[2]] : subscript out of bounds Calls: do.call ... <Anonymous> -> FUN -> FUN -> lapply -> lapply -> FUN Execution halted 14/02/26 08:35:22 ERROR Executor: Exception in task ID 4 java.lang.NullPointerException at edu.berkeley.cs.amplab.sparkr.RRDD.compute(RRDD.scala:117) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:241) at org.apache.spark.rdd.RDD.iterator(RDD.scala:232) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:109) at org.apache.spark.scheduler.Task.run(Task.scala:53) at org.apache.spark.executor.Executor$TaskRunner$$anonfun$run$1.apply$mcV$sp(Executor.scala:213) at org.apache.spark.deploy.SparkHadoopUtil.runAsUser(SparkHadoopUtil.scala:49) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:178) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:724) 14/02/26 08:35:22 INFO CoarseGrainedExecutorBackend: Got assigned task 5 14/02/26 08:35:22 INFO Executor: Running task ID 5 14/02/26 08:35:22 INFO BlockManager: Found block broadcast_0 locally 14/02/26 08:35:22 INFO BlockFetcherIterator$BasicBlockFetcherIterator: Getting 2 non-zero-bytes blocks out of 2 blocks 14/02/26 08:35:22 INFO BlockFetcherIterator$BasicBlockFetcherIterator: Started 0 remote gets in 0 ms Error in item[[2]] : subscript out of bounds Calls: do.call ... <Anonymous> -> FUN -> FUN -> lapply -> lapply -> FUN Execution halted 14/02/26 08:35:23 ERROR Executor: Exception in task ID 5 java.lang.NullPointerException at edu.berkeley.cs.amplab.sparkr.RRDD.compute(RRDD.scala:117) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:241) at org.apache.spark.rdd.RDD.iterator(RDD.scala:232) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:109) at org.apache.spark.scheduler.Task.run(Task.scala:53) at org.apache.spark.executor.Executor$TaskRunner$$anonfun$run$1.apply$mcV$sp(Executor.scala:213) at org.apache.spark.deploy.SparkHadoopUtil.runAsUser(SparkHadoopUtil.scala:49) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:178) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:724) 14/02/26 08:35:23 INFO CoarseGrainedExecutorBackend: Got assigned task 6 14/02/26 08:35:23 INFO Executor: Running task ID 6 14/02/26 08:35:23 INFO BlockManager: Found block broadcast_0 locally 14/02/26 08:35:23 INFO BlockFetcherIterator$BasicBlockFetcherIterator: Getting 2 non-zero-bytes blocks out of 2 blocks 14/02/26 08:35:23 INFO BlockFetcherIterator$BasicBlockFetcherIterator: Started 0 remote gets in 1 ms Error in item[[2]] : subscript out of bounds Calls: do.call ... <Anonymous> -> FUN -> FUN -> lapply -> lapply -> FUN Execution halted 14/02/26 08:35:25 ERROR Executor: Exception in task ID 6 java.lang.NullPointerException at edu.berkeley.cs.amplab.sparkr.RRDD.compute(RRDD.scala:117) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:241) at org.apache.spark.rdd.RDD.iterator(RDD.scala:232) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:109) at org.apache.spark.scheduler.Task.run(Task.scala:53) at org.apache.spark.executor.Executor$TaskRunner$$anonfun$run$1.apply$mcV$sp(Executor.scala:213) at org.apache.spark.deploy.SparkHadoopUtil.runAsUser(SparkHadoopUtil.scala:49) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:178) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:724) {code} h4. Relevant environment variables #### {code} SPARK_HOME=/Users/[user]/Software/spark-0.9.0-incubating-bin-hadoop1/ SPARK_LOCAL_IP=<ip address> {code} Thanks for the help edit: updated the format

    JIRA | 3 years ago | rperko3
    java.lang.NullPointerException
  6. 0

    Hi, I am having an issue that seems similar to SPARKR-17, except I am able run pi.R just fine but not wordcount.R I have a spark master running and a single worker. All the spark examples work, including JavaWordCount I ran the following command: {{./sparkR examples/wordcount.R spark://[host]:7077 README.md}} (I get the same error with master set as "local" as well as using various methods to identify the file, including the full path and using the protocol "file:///...") h3. Console Output: ### {code} Loading required package: SparkR Loading required package: methods Loading required package: rJava [SparkR] Initializing with classpath /Users/[user]/Projects/SparkR-pkg/lib/SparkR/sparkr-assembly-0.1.jar 14/02/26 08:35:12 INFO Slf4jLogger: Slf4jLogger started 2014-02-26 08:35:13.632 R[14476:d0b] Unable to load realm mapping info from SCDynamicStore 14/02/26 08:35:14 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 14/02/26 08:35:14 WARN LoadSnappy: Snappy native library not loaded 14/02/26 08:35:14 INFO FileInputFormat: Total input paths to process : 1 14/02/26 08:35:21 WARN TaskSetManager: Lost TID 2 (task 0.0:0) 14/02/26 08:35:21 WARN TaskSetManager: Loss was due to java.lang.NullPointerException java.lang.NullPointerException at edu.berkeley.cs.amplab.sparkr.RRDD.compute(RRDD.scala:117) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:241) at org.apache.spark.rdd.RDD.iterator(RDD.scala:232) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:109) at org.apache.spark.scheduler.Task.run(Task.scala:53) at org.apache.spark.executor.Executor$TaskRunner$$anonfun$run$1.apply$mcV$sp(Executor.scala:213) at org.apache.spark.deploy.SparkHadoopUtil.runAsUser(SparkHadoopUtil.scala:49) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:178) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:724) 14/02/26 08:35:22 WARN TaskSetManager: Lost TID 4 (task 0.0:0) 14/02/26 08:35:23 WARN TaskSetManager: Lost TID 5 (task 0.0:0) 14/02/26 08:35:25 WARN TaskSetManager: Lost TID 6 (task 0.0:0) 14/02/26 08:35:25 ERROR TaskSetManager: Task 0.0:0 failed 4 times; aborting job Error in .jcall(getJRDD(rdd), "Ljava/util/List;", "collect") : org.apache.spark.SparkException: Job aborted: Task 0.0:0 failed 4 times (most recent failure: Exception failure: java.lang.NullPointerException) Calls: collect -> collect -> .local -> .jcall -> .jcheck -> .Call Execution halted {code} h3. Worker Log ### {code} Spark Executor Command: "java" "-cp" ":/Users/[user]/Software/spark-0.9.0-incubating-bin-hadoop1/conf:/Users/[user]/Software/spark-0.9.0-incubating-bin-hadoop1/assembly/target/scala-2.10/spark-assembly_2.10-0.9.0-incubating-hadoop1.0.4.jar" "-Xms512M" "-Xmx512M" "org.apache.spark.executor.CoarseGrainedExecutorBackend" "akka.tcp://spark@[host]:56220/user/CoarseGrainedScheduler" "0" "[host]" "12" "akka.tcp://sparkWorker@[host]:58525/user/Worker" "app-20140226083514-0002" ======================================== log4j:WARN No appenders could be found for logger (akka.event.slf4j.Slf4jLogger). log4j:WARN Please initialize the log4j system properly. log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info. 14/02/26 08:35:15 INFO CoarseGrainedExecutorBackend: Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties 14/02/26 08:35:15 INFO CoarseGrainedExecutorBackend: Connecting to driver: akka.tcp://spark@[host]:56220/user/CoarseGrainedScheduler 14/02/26 08:35:15 INFO WorkerWatcher: Connecting to worker akka.tcp://sparkWorker@[host]:58525/user/Worker 14/02/26 08:35:15 INFO WorkerWatcher: Successfully connected to akka.tcp://sparkWorker@[host]:58525/user/Worker 14/02/26 08:35:15 INFO CoarseGrainedExecutorBackend: Successfully registered with driver 14/02/26 08:35:15 INFO Slf4jLogger: Slf4jLogger started 14/02/26 08:35:15 INFO Remoting: Starting remoting 14/02/26 08:35:15 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://spark@[host]:56228] 14/02/26 08:35:15 INFO Remoting: Remoting now listens on addresses: [akka.tcp://spark@[host]:56228] 14/02/26 08:35:15 INFO SparkEnv: Connecting to BlockManagerMaster: akka.tcp://spark@[host]:56220/user/BlockManagerMaster 14/02/26 08:35:16 INFO DiskBlockManager: Created local directory at /var/folders/hv/vgn69gtn7q54v3x9b9hldz2r3gdyms/T/spark-local-20140226083516-742b 14/02/26 08:35:16 INFO MemoryStore: MemoryStore started with capacity 294.9 MB. 14/02/26 08:35:16 INFO ConnectionManager: Bound socket to port 56230 with id = ConnectionManagerId([host],56230) 14/02/26 08:35:16 INFO BlockManagerMaster: Trying to register BlockManager 14/02/26 08:35:16 INFO BlockManagerMaster: Registered BlockManager 14/02/26 08:35:16 INFO SparkEnv: Connecting to MapOutputTracker: akka.tcp://spark@[host]:56220/user/MapOutputTracker 14/02/26 08:35:16 INFO HttpFileServer: HTTP File server directory is /var/folders/hv/vgn69gtn7q54v3x9b9hldz2r3gdyms/T/spark-f56850c3-b1e4-4e1d-ade8-d3e88ecb740f 14/02/26 08:35:16 INFO HttpServer: Starting HTTP Server 14/02/26 08:35:16 INFO CoarseGrainedExecutorBackend: Got assigned task 0 14/02/26 08:35:16 INFO CoarseGrainedExecutorBackend: Got assigned task 1 2014-02-26 08:35:16.477 java[14486:b803] Unable to load realm mapping info from SCDynamicStore 14/02/26 08:35:16 INFO Executor: Running task ID 0 14/02/26 08:35:16 INFO Executor: Running task ID 1 14/02/26 08:35:16 INFO Executor: Fetching http://130.20.186.189:56223/jars/sparkr-assembly-0.1.jar with timestamp 1393432514043 14/02/26 08:35:16 INFO Utils: Fetching http://130.20.186.189:56223/jars/sparkr-assembly-0.1.jar to /var/folders/hv/vgn69gtn7q54v3x9b9hldz2r3gdyms/T/fetchFileTemp4469264870321020627.tmp 14/02/26 08:35:17 INFO Executor: Adding file:/Users/[user]/Software/spark-0.9.0-incubating-bin-hadoop1/work/app-20140226083514-0002/0/./sparkr-assembly-0.1.jar to class loader 14/02/26 08:35:17 INFO HttpBroadcast: Started reading broadcast variable 0 14/02/26 08:35:17 INFO MemoryStore: ensureFreeSpace(39208) called with curMem=0, maxMem=309225062 14/02/26 08:35:17 INFO MemoryStore: Block broadcast_0 stored as values to memory (estimated size 38.3 KB, free 294.9 MB) 14/02/26 08:35:17 INFO HttpBroadcast: Reading broadcast variable 0 took 0.149631 s 14/02/26 08:35:17 INFO BlockManager: Found block broadcast_0 locally 14/02/26 08:35:17 INFO HadoopRDD: Input split: file:/Users/[user]/Projects/SparkR-pkg/README.md:1547+1547 14/02/26 08:35:17 INFO HadoopRDD: Input split: file:/Users/[user]/Projects/SparkR-pkg/README.md:0+1547 14/02/26 08:35:17 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 14/02/26 08:35:17 WARN LoadSnappy: Snappy native library not loaded 14/02/26 08:35:19 INFO Executor: Serialized size of result for 1 is 751 14/02/26 08:35:19 INFO Executor: Serialized size of result for 0 is 751 14/02/26 08:35:19 INFO Executor: Sending result for 1 directly to driver 14/02/26 08:35:19 INFO Executor: Sending result for 0 directly to driver 14/02/26 08:35:19 INFO Executor: Finished task ID 0 14/02/26 08:35:19 INFO Executor: Finished task ID 1 14/02/26 08:35:19 INFO CoarseGrainedExecutorBackend: Got assigned task 2 14/02/26 08:35:19 INFO Executor: Running task ID 2 14/02/26 08:35:19 INFO CoarseGrainedExecutorBackend: Got assigned task 3 14/02/26 08:35:19 INFO Executor: Running task ID 3 14/02/26 08:35:19 INFO BlockManager: Found block broadcast_0 locally 14/02/26 08:35:19 INFO BlockManager: Found block broadcast_0 locally 14/02/26 08:35:20 INFO MapOutputTracker: Updating epoch to 1 and clearing cache 14/02/26 08:35:20 INFO MapOutputTracker: Don't have map outputs for shuffle 0, fetching them 14/02/26 08:35:20 INFO MapOutputTracker: Don't have map outputs for shuffle 0, fetching them 14/02/26 08:35:20 INFO MapOutputTracker: Doing the fetch; tracker actor = Actor[akka.tcp://spark@[host]:56220/user/MapOutputTracker#-260655231] 14/02/26 08:35:20 INFO MapOutputTracker: Got the output locations 14/02/26 08:35:20 INFO BlockFetcherIterator$BasicBlockFetcherIterator: Getting 2 non-zero-bytes blocks out of 2 blocks 14/02/26 08:35:20 INFO BlockFetcherIterator$BasicBlockFetcherIterator: Getting 0 non-zero-bytes blocks out of 2 blocks 14/02/26 08:35:20 INFO BlockFetcherIterator$BasicBlockFetcherIterator: Started 0 remote gets in 6 ms 14/02/26 08:35:20 INFO BlockFetcherIterator$BasicBlockFetcherIterator: Started 0 remote gets in 6 ms 14/02/26 08:35:21 INFO Executor: Serialized size of result for 3 is 807 14/02/26 08:35:21 INFO Executor: Sending result for 3 directly to driver 14/02/26 08:35:21 INFO Executor: Finished task ID 3 Error in item[[2]] : subscript out of bounds Calls: do.call ... <Anonymous> -> FUN -> FUN -> lapply -> lapply -> FUN Execution halted 14/02/26 08:35:21 ERROR Executor: Exception in task ID 2 java.lang.NullPointerException at edu.berkeley.cs.amplab.sparkr.RRDD.compute(RRDD.scala:117) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:241) at org.apache.spark.rdd.RDD.iterator(RDD.scala:232) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:109) at org.apache.spark.scheduler.Task.run(Task.scala:53) at org.apache.spark.executor.Executor$TaskRunner$$anonfun$run$1.apply$mcV$sp(Executor.scala:213) at org.apache.spark.deploy.SparkHadoopUtil.runAsUser(SparkHadoopUtil.scala:49) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:178) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:724) 14/02/26 08:35:21 INFO CoarseGrainedExecutorBackend: Got assigned task 4 14/02/26 08:35:21 INFO Executor: Running task ID 4 14/02/26 08:35:21 INFO BlockManager: Found block broadcast_0 locally 14/02/26 08:35:21 INFO BlockFetcherIterator$BasicBlockFetcherIterator: Getting 2 non-zero-bytes blocks out of 2 blocks 14/02/26 08:35:21 INFO BlockFetcherIterator$BasicBlockFetcherIterator: Started 0 remote gets in 0 ms Error in item[[2]] : subscript out of bounds Calls: do.call ... <Anonymous> -> FUN -> FUN -> lapply -> lapply -> FUN Execution halted 14/02/26 08:35:22 ERROR Executor: Exception in task ID 4 java.lang.NullPointerException at edu.berkeley.cs.amplab.sparkr.RRDD.compute(RRDD.scala:117) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:241) at org.apache.spark.rdd.RDD.iterator(RDD.scala:232) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:109) at org.apache.spark.scheduler.Task.run(Task.scala:53) at org.apache.spark.executor.Executor$TaskRunner$$anonfun$run$1.apply$mcV$sp(Executor.scala:213) at org.apache.spark.deploy.SparkHadoopUtil.runAsUser(SparkHadoopUtil.scala:49) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:178) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:724) 14/02/26 08:35:22 INFO CoarseGrainedExecutorBackend: Got assigned task 5 14/02/26 08:35:22 INFO Executor: Running task ID 5 14/02/26 08:35:22 INFO BlockManager: Found block broadcast_0 locally 14/02/26 08:35:22 INFO BlockFetcherIterator$BasicBlockFetcherIterator: Getting 2 non-zero-bytes blocks out of 2 blocks 14/02/26 08:35:22 INFO BlockFetcherIterator$BasicBlockFetcherIterator: Started 0 remote gets in 0 ms Error in item[[2]] : subscript out of bounds Calls: do.call ... <Anonymous> -> FUN -> FUN -> lapply -> lapply -> FUN Execution halted 14/02/26 08:35:23 ERROR Executor: Exception in task ID 5 java.lang.NullPointerException at edu.berkeley.cs.amplab.sparkr.RRDD.compute(RRDD.scala:117) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:241) at org.apache.spark.rdd.RDD.iterator(RDD.scala:232) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:109) at org.apache.spark.scheduler.Task.run(Task.scala:53) at org.apache.spark.executor.Executor$TaskRunner$$anonfun$run$1.apply$mcV$sp(Executor.scala:213) at org.apache.spark.deploy.SparkHadoopUtil.runAsUser(SparkHadoopUtil.scala:49) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:178) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:724) 14/02/26 08:35:23 INFO CoarseGrainedExecutorBackend: Got assigned task 6 14/02/26 08:35:23 INFO Executor: Running task ID 6 14/02/26 08:35:23 INFO BlockManager: Found block broadcast_0 locally 14/02/26 08:35:23 INFO BlockFetcherIterator$BasicBlockFetcherIterator: Getting 2 non-zero-bytes blocks out of 2 blocks 14/02/26 08:35:23 INFO BlockFetcherIterator$BasicBlockFetcherIterator: Started 0 remote gets in 1 ms Error in item[[2]] : subscript out of bounds Calls: do.call ... <Anonymous> -> FUN -> FUN -> lapply -> lapply -> FUN Execution halted 14/02/26 08:35:25 ERROR Executor: Exception in task ID 6 java.lang.NullPointerException at edu.berkeley.cs.amplab.sparkr.RRDD.compute(RRDD.scala:117) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:241) at org.apache.spark.rdd.RDD.iterator(RDD.scala:232) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:109) at org.apache.spark.scheduler.Task.run(Task.scala:53) at org.apache.spark.executor.Executor$TaskRunner$$anonfun$run$1.apply$mcV$sp(Executor.scala:213) at org.apache.spark.deploy.SparkHadoopUtil.runAsUser(SparkHadoopUtil.scala:49) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:178) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:724) {code} h4. Relevant environment variables #### {code} SPARK_HOME=/Users/[user]/Software/spark-0.9.0-incubating-bin-hadoop1/ SPARK_LOCAL_IP=<ip address> {code} Thanks for the help edit: updated the format

    JIRA | 3 years ago | rperko3
    java.lang.NullPointerException

    Not finding the right solution?
    Take a tour to get the most out of Samebug.

    Tired of useless tips?

    Automated exception search integrated into your IDE

    Root Cause Analysis

    1. java.lang.NullPointerException

      No message provided

      at edu.berkeley.cs.amplab.sparkr.RRDD.compute()
    2. edu.berkeley.cs
      RRDD.compute
      1. edu.berkeley.cs.amplab.sparkr.RRDD.compute(RRDD.scala:128)
      1 frame
    3. Spark
      Executor$TaskRunner.run
      1. org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
      2. org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
      3. org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62)
      4. org.apache.spark.scheduler.Task.run(Task.scala:54)
      5. org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177)
      5 frames
    4. Java RT
      Thread.run
      1. java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
      2. java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
      3. java.lang.Thread.run(Thread.java:695)
      3 frames