java.lang.NullPointerException

JIRA | rperko3 | 3 years ago
tip
Click on the to mark the solution that helps you, Samebug will learn from it.
As a community member, you’ll be rewarded for you help.
  1. 0

    Hi, I am having an issue that seems similar to SPARKR-17, except I am able run pi.R just fine but not wordcount.R I have a spark master running and a single worker. All the spark examples work, including JavaWordCount I ran the following command: {{./sparkR examples/wordcount.R spark://[host]:7077 README.md}} (I get the same error with master set as "local" as well as using various methods to identify the file, including the full path and using the protocol "file:///...") h3. Console Output: ### {code} Loading required package: SparkR Loading required package: methods Loading required package: rJava [SparkR] Initializing with classpath /Users/[user]/Projects/SparkR-pkg/lib/SparkR/sparkr-assembly-0.1.jar 14/02/26 08:35:12 INFO Slf4jLogger: Slf4jLogger started 2014-02-26 08:35:13.632 R[14476:d0b] Unable to load realm mapping info from SCDynamicStore 14/02/26 08:35:14 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 14/02/26 08:35:14 WARN LoadSnappy: Snappy native library not loaded 14/02/26 08:35:14 INFO FileInputFormat: Total input paths to process : 1 14/02/26 08:35:21 WARN TaskSetManager: Lost TID 2 (task 0.0:0) 14/02/26 08:35:21 WARN TaskSetManager: Loss was due to java.lang.NullPointerException java.lang.NullPointerException at edu.berkeley.cs.amplab.sparkr.RRDD.compute(RRDD.scala:117) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:241) at org.apache.spark.rdd.RDD.iterator(RDD.scala:232) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:109) at org.apache.spark.scheduler.Task.run(Task.scala:53) at org.apache.spark.executor.Executor$TaskRunner$$anonfun$run$1.apply$mcV$sp(Executor.scala:213) at org.apache.spark.deploy.SparkHadoopUtil.runAsUser(SparkHadoopUtil.scala:49) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:178) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:724) 14/02/26 08:35:22 WARN TaskSetManager: Lost TID 4 (task 0.0:0) 14/02/26 08:35:23 WARN TaskSetManager: Lost TID 5 (task 0.0:0) 14/02/26 08:35:25 WARN TaskSetManager: Lost TID 6 (task 0.0:0) 14/02/26 08:35:25 ERROR TaskSetManager: Task 0.0:0 failed 4 times; aborting job Error in .jcall(getJRDD(rdd), "Ljava/util/List;", "collect") : org.apache.spark.SparkException: Job aborted: Task 0.0:0 failed 4 times (most recent failure: Exception failure: java.lang.NullPointerException) Calls: collect -> collect -> .local -> .jcall -> .jcheck -> .Call Execution halted {code} h3. Worker Log ### {code} Spark Executor Command: "java" "-cp" ":/Users/[user]/Software/spark-0.9.0-incubating-bin-hadoop1/conf:/Users/[user]/Software/spark-0.9.0-incubating-bin-hadoop1/assembly/target/scala-2.10/spark-assembly_2.10-0.9.0-incubating-hadoop1.0.4.jar" "-Xms512M" "-Xmx512M" "org.apache.spark.executor.CoarseGrainedExecutorBackend" "akka.tcp://spark@[host]:56220/user/CoarseGrainedScheduler" "0" "[host]" "12" "akka.tcp://sparkWorker@[host]:58525/user/Worker" "app-20140226083514-0002" ======================================== log4j:WARN No appenders could be found for logger (akka.event.slf4j.Slf4jLogger). log4j:WARN Please initialize the log4j system properly. log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info. 14/02/26 08:35:15 INFO CoarseGrainedExecutorBackend: Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties 14/02/26 08:35:15 INFO CoarseGrainedExecutorBackend: Connecting to driver: akka.tcp://spark@[host]:56220/user/CoarseGrainedScheduler 14/02/26 08:35:15 INFO WorkerWatcher: Connecting to worker akka.tcp://sparkWorker@[host]:58525/user/Worker 14/02/26 08:35:15 INFO WorkerWatcher: Successfully connected to akka.tcp://sparkWorker@[host]:58525/user/Worker 14/02/26 08:35:15 INFO CoarseGrainedExecutorBackend: Successfully registered with driver 14/02/26 08:35:15 INFO Slf4jLogger: Slf4jLogger started 14/02/26 08:35:15 INFO Remoting: Starting remoting 14/02/26 08:35:15 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://spark@[host]:56228] 14/02/26 08:35:15 INFO Remoting: Remoting now listens on addresses: [akka.tcp://spark@[host]:56228] 14/02/26 08:35:15 INFO SparkEnv: Connecting to BlockManagerMaster: akka.tcp://spark@[host]:56220/user/BlockManagerMaster 14/02/26 08:35:16 INFO DiskBlockManager: Created local directory at /var/folders/hv/vgn69gtn7q54v3x9b9hldz2r3gdyms/T/spark-local-20140226083516-742b 14/02/26 08:35:16 INFO MemoryStore: MemoryStore started with capacity 294.9 MB. 14/02/26 08:35:16 INFO ConnectionManager: Bound socket to port 56230 with id = ConnectionManagerId([host],56230) 14/02/26 08:35:16 INFO BlockManagerMaster: Trying to register BlockManager 14/02/26 08:35:16 INFO BlockManagerMaster: Registered BlockManager 14/02/26 08:35:16 INFO SparkEnv: Connecting to MapOutputTracker: akka.tcp://spark@[host]:56220/user/MapOutputTracker 14/02/26 08:35:16 INFO HttpFileServer: HTTP File server directory is /var/folders/hv/vgn69gtn7q54v3x9b9hldz2r3gdyms/T/spark-f56850c3-b1e4-4e1d-ade8-d3e88ecb740f 14/02/26 08:35:16 INFO HttpServer: Starting HTTP Server 14/02/26 08:35:16 INFO CoarseGrainedExecutorBackend: Got assigned task 0 14/02/26 08:35:16 INFO CoarseGrainedExecutorBackend: Got assigned task 1 2014-02-26 08:35:16.477 java[14486:b803] Unable to load realm mapping info from SCDynamicStore 14/02/26 08:35:16 INFO Executor: Running task ID 0 14/02/26 08:35:16 INFO Executor: Running task ID 1 14/02/26 08:35:16 INFO Executor: Fetching http://130.20.186.189:56223/jars/sparkr-assembly-0.1.jar with timestamp 1393432514043 14/02/26 08:35:16 INFO Utils: Fetching http://130.20.186.189:56223/jars/sparkr-assembly-0.1.jar to /var/folders/hv/vgn69gtn7q54v3x9b9hldz2r3gdyms/T/fetchFileTemp4469264870321020627.tmp 14/02/26 08:35:17 INFO Executor: Adding file:/Users/[user]/Software/spark-0.9.0-incubating-bin-hadoop1/work/app-20140226083514-0002/0/./sparkr-assembly-0.1.jar to class loader 14/02/26 08:35:17 INFO HttpBroadcast: Started reading broadcast variable 0 14/02/26 08:35:17 INFO MemoryStore: ensureFreeSpace(39208) called with curMem=0, maxMem=309225062 14/02/26 08:35:17 INFO MemoryStore: Block broadcast_0 stored as values to memory (estimated size 38.3 KB, free 294.9 MB) 14/02/26 08:35:17 INFO HttpBroadcast: Reading broadcast variable 0 took 0.149631 s 14/02/26 08:35:17 INFO BlockManager: Found block broadcast_0 locally 14/02/26 08:35:17 INFO HadoopRDD: Input split: file:/Users/[user]/Projects/SparkR-pkg/README.md:1547+1547 14/02/26 08:35:17 INFO HadoopRDD: Input split: file:/Users/[user]/Projects/SparkR-pkg/README.md:0+1547 14/02/26 08:35:17 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 14/02/26 08:35:17 WARN LoadSnappy: Snappy native library not loaded 14/02/26 08:35:19 INFO Executor: Serialized size of result for 1 is 751 14/02/26 08:35:19 INFO Executor: Serialized size of result for 0 is 751 14/02/26 08:35:19 INFO Executor: Sending result for 1 directly to driver 14/02/26 08:35:19 INFO Executor: Sending result for 0 directly to driver 14/02/26 08:35:19 INFO Executor: Finished task ID 0 14/02/26 08:35:19 INFO Executor: Finished task ID 1 14/02/26 08:35:19 INFO CoarseGrainedExecutorBackend: Got assigned task 2 14/02/26 08:35:19 INFO Executor: Running task ID 2 14/02/26 08:35:19 INFO CoarseGrainedExecutorBackend: Got assigned task 3 14/02/26 08:35:19 INFO Executor: Running task ID 3 14/02/26 08:35:19 INFO BlockManager: Found block broadcast_0 locally 14/02/26 08:35:19 INFO BlockManager: Found block broadcast_0 locally 14/02/26 08:35:20 INFO MapOutputTracker: Updating epoch to 1 and clearing cache 14/02/26 08:35:20 INFO MapOutputTracker: Don't have map outputs for shuffle 0, fetching them 14/02/26 08:35:20 INFO MapOutputTracker: Don't have map outputs for shuffle 0, fetching them 14/02/26 08:35:20 INFO MapOutputTracker: Doing the fetch; tracker actor = Actor[akka.tcp://spark@[host]:56220/user/MapOutputTracker#-260655231] 14/02/26 08:35:20 INFO MapOutputTracker: Got the output locations 14/02/26 08:35:20 INFO BlockFetcherIterator$BasicBlockFetcherIterator: Getting 2 non-zero-bytes blocks out of 2 blocks 14/02/26 08:35:20 INFO BlockFetcherIterator$BasicBlockFetcherIterator: Getting 0 non-zero-bytes blocks out of 2 blocks 14/02/26 08:35:20 INFO BlockFetcherIterator$BasicBlockFetcherIterator: Started 0 remote gets in 6 ms 14/02/26 08:35:20 INFO BlockFetcherIterator$BasicBlockFetcherIterator: Started 0 remote gets in 6 ms 14/02/26 08:35:21 INFO Executor: Serialized size of result for 3 is 807 14/02/26 08:35:21 INFO Executor: Sending result for 3 directly to driver 14/02/26 08:35:21 INFO Executor: Finished task ID 3 Error in item[[2]] : subscript out of bounds Calls: do.call ... <Anonymous> -> FUN -> FUN -> lapply -> lapply -> FUN Execution halted 14/02/26 08:35:21 ERROR Executor: Exception in task ID 2 java.lang.NullPointerException at edu.berkeley.cs.amplab.sparkr.RRDD.compute(RRDD.scala:117) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:241) at org.apache.spark.rdd.RDD.iterator(RDD.scala:232) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:109) at org.apache.spark.scheduler.Task.run(Task.scala:53) at org.apache.spark.executor.Executor$TaskRunner$$anonfun$run$1.apply$mcV$sp(Executor.scala:213) at org.apache.spark.deploy.SparkHadoopUtil.runAsUser(SparkHadoopUtil.scala:49) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:178) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:724) 14/02/26 08:35:21 INFO CoarseGrainedExecutorBackend: Got assigned task 4 14/02/26 08:35:21 INFO Executor: Running task ID 4 14/02/26 08:35:21 INFO BlockManager: Found block broadcast_0 locally 14/02/26 08:35:21 INFO BlockFetcherIterator$BasicBlockFetcherIterator: Getting 2 non-zero-bytes blocks out of 2 blocks 14/02/26 08:35:21 INFO BlockFetcherIterator$BasicBlockFetcherIterator: Started 0 remote gets in 0 ms Error in item[[2]] : subscript out of bounds Calls: do.call ... <Anonymous> -> FUN -> FUN -> lapply -> lapply -> FUN Execution halted 14/02/26 08:35:22 ERROR Executor: Exception in task ID 4 java.lang.NullPointerException at edu.berkeley.cs.amplab.sparkr.RRDD.compute(RRDD.scala:117) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:241) at org.apache.spark.rdd.RDD.iterator(RDD.scala:232) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:109) at org.apache.spark.scheduler.Task.run(Task.scala:53) at org.apache.spark.executor.Executor$TaskRunner$$anonfun$run$1.apply$mcV$sp(Executor.scala:213) at org.apache.spark.deploy.SparkHadoopUtil.runAsUser(SparkHadoopUtil.scala:49) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:178) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:724) 14/02/26 08:35:22 INFO CoarseGrainedExecutorBackend: Got assigned task 5 14/02/26 08:35:22 INFO Executor: Running task ID 5 14/02/26 08:35:22 INFO BlockManager: Found block broadcast_0 locally 14/02/26 08:35:22 INFO BlockFetcherIterator$BasicBlockFetcherIterator: Getting 2 non-zero-bytes blocks out of 2 blocks 14/02/26 08:35:22 INFO BlockFetcherIterator$BasicBlockFetcherIterator: Started 0 remote gets in 0 ms Error in item[[2]] : subscript out of bounds Calls: do.call ... <Anonymous> -> FUN -> FUN -> lapply -> lapply -> FUN Execution halted 14/02/26 08:35:23 ERROR Executor: Exception in task ID 5 java.lang.NullPointerException at edu.berkeley.cs.amplab.sparkr.RRDD.compute(RRDD.scala:117) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:241) at org.apache.spark.rdd.RDD.iterator(RDD.scala:232) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:109) at org.apache.spark.scheduler.Task.run(Task.scala:53) at org.apache.spark.executor.Executor$TaskRunner$$anonfun$run$1.apply$mcV$sp(Executor.scala:213) at org.apache.spark.deploy.SparkHadoopUtil.runAsUser(SparkHadoopUtil.scala:49) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:178) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:724) 14/02/26 08:35:23 INFO CoarseGrainedExecutorBackend: Got assigned task 6 14/02/26 08:35:23 INFO Executor: Running task ID 6 14/02/26 08:35:23 INFO BlockManager: Found block broadcast_0 locally 14/02/26 08:35:23 INFO BlockFetcherIterator$BasicBlockFetcherIterator: Getting 2 non-zero-bytes blocks out of 2 blocks 14/02/26 08:35:23 INFO BlockFetcherIterator$BasicBlockFetcherIterator: Started 0 remote gets in 1 ms Error in item[[2]] : subscript out of bounds Calls: do.call ... <Anonymous> -> FUN -> FUN -> lapply -> lapply -> FUN Execution halted 14/02/26 08:35:25 ERROR Executor: Exception in task ID 6 java.lang.NullPointerException at edu.berkeley.cs.amplab.sparkr.RRDD.compute(RRDD.scala:117) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:241) at org.apache.spark.rdd.RDD.iterator(RDD.scala:232) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:109) at org.apache.spark.scheduler.Task.run(Task.scala:53) at org.apache.spark.executor.Executor$TaskRunner$$anonfun$run$1.apply$mcV$sp(Executor.scala:213) at org.apache.spark.deploy.SparkHadoopUtil.runAsUser(SparkHadoopUtil.scala:49) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:178) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:724) {code} h4. Relevant environment variables #### {code} SPARK_HOME=/Users/[user]/Software/spark-0.9.0-incubating-bin-hadoop1/ SPARK_LOCAL_IP=<ip address> {code} Thanks for the help edit: updated the format

    JIRA | 3 years ago | rperko3
    java.lang.NullPointerException
  2. 0

    Hi, I am having an issue that seems similar to SPARKR-17, except I am able run pi.R just fine but not wordcount.R I have a spark master running and a single worker. All the spark examples work, including JavaWordCount I ran the following command: {{./sparkR examples/wordcount.R spark://[host]:7077 README.md}} (I get the same error with master set as "local" as well as using various methods to identify the file, including the full path and using the protocol "file:///...") h3. Console Output: ### {code} Loading required package: SparkR Loading required package: methods Loading required package: rJava [SparkR] Initializing with classpath /Users/[user]/Projects/SparkR-pkg/lib/SparkR/sparkr-assembly-0.1.jar 14/02/26 08:35:12 INFO Slf4jLogger: Slf4jLogger started 2014-02-26 08:35:13.632 R[14476:d0b] Unable to load realm mapping info from SCDynamicStore 14/02/26 08:35:14 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 14/02/26 08:35:14 WARN LoadSnappy: Snappy native library not loaded 14/02/26 08:35:14 INFO FileInputFormat: Total input paths to process : 1 14/02/26 08:35:21 WARN TaskSetManager: Lost TID 2 (task 0.0:0) 14/02/26 08:35:21 WARN TaskSetManager: Loss was due to java.lang.NullPointerException java.lang.NullPointerException at edu.berkeley.cs.amplab.sparkr.RRDD.compute(RRDD.scala:117) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:241) at org.apache.spark.rdd.RDD.iterator(RDD.scala:232) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:109) at org.apache.spark.scheduler.Task.run(Task.scala:53) at org.apache.spark.executor.Executor$TaskRunner$$anonfun$run$1.apply$mcV$sp(Executor.scala:213) at org.apache.spark.deploy.SparkHadoopUtil.runAsUser(SparkHadoopUtil.scala:49) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:178) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:724) 14/02/26 08:35:22 WARN TaskSetManager: Lost TID 4 (task 0.0:0) 14/02/26 08:35:23 WARN TaskSetManager: Lost TID 5 (task 0.0:0) 14/02/26 08:35:25 WARN TaskSetManager: Lost TID 6 (task 0.0:0) 14/02/26 08:35:25 ERROR TaskSetManager: Task 0.0:0 failed 4 times; aborting job Error in .jcall(getJRDD(rdd), "Ljava/util/List;", "collect") : org.apache.spark.SparkException: Job aborted: Task 0.0:0 failed 4 times (most recent failure: Exception failure: java.lang.NullPointerException) Calls: collect -> collect -> .local -> .jcall -> .jcheck -> .Call Execution halted {code} h3. Worker Log ### {code} Spark Executor Command: "java" "-cp" ":/Users/[user]/Software/spark-0.9.0-incubating-bin-hadoop1/conf:/Users/[user]/Software/spark-0.9.0-incubating-bin-hadoop1/assembly/target/scala-2.10/spark-assembly_2.10-0.9.0-incubating-hadoop1.0.4.jar" "-Xms512M" "-Xmx512M" "org.apache.spark.executor.CoarseGrainedExecutorBackend" "akka.tcp://spark@[host]:56220/user/CoarseGrainedScheduler" "0" "[host]" "12" "akka.tcp://sparkWorker@[host]:58525/user/Worker" "app-20140226083514-0002" ======================================== log4j:WARN No appenders could be found for logger (akka.event.slf4j.Slf4jLogger). log4j:WARN Please initialize the log4j system properly. log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info. 14/02/26 08:35:15 INFO CoarseGrainedExecutorBackend: Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties 14/02/26 08:35:15 INFO CoarseGrainedExecutorBackend: Connecting to driver: akka.tcp://spark@[host]:56220/user/CoarseGrainedScheduler 14/02/26 08:35:15 INFO WorkerWatcher: Connecting to worker akka.tcp://sparkWorker@[host]:58525/user/Worker 14/02/26 08:35:15 INFO WorkerWatcher: Successfully connected to akka.tcp://sparkWorker@[host]:58525/user/Worker 14/02/26 08:35:15 INFO CoarseGrainedExecutorBackend: Successfully registered with driver 14/02/26 08:35:15 INFO Slf4jLogger: Slf4jLogger started 14/02/26 08:35:15 INFO Remoting: Starting remoting 14/02/26 08:35:15 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://spark@[host]:56228] 14/02/26 08:35:15 INFO Remoting: Remoting now listens on addresses: [akka.tcp://spark@[host]:56228] 14/02/26 08:35:15 INFO SparkEnv: Connecting to BlockManagerMaster: akka.tcp://spark@[host]:56220/user/BlockManagerMaster 14/02/26 08:35:16 INFO DiskBlockManager: Created local directory at /var/folders/hv/vgn69gtn7q54v3x9b9hldz2r3gdyms/T/spark-local-20140226083516-742b 14/02/26 08:35:16 INFO MemoryStore: MemoryStore started with capacity 294.9 MB. 14/02/26 08:35:16 INFO ConnectionManager: Bound socket to port 56230 with id = ConnectionManagerId([host],56230) 14/02/26 08:35:16 INFO BlockManagerMaster: Trying to register BlockManager 14/02/26 08:35:16 INFO BlockManagerMaster: Registered BlockManager 14/02/26 08:35:16 INFO SparkEnv: Connecting to MapOutputTracker: akka.tcp://spark@[host]:56220/user/MapOutputTracker 14/02/26 08:35:16 INFO HttpFileServer: HTTP File server directory is /var/folders/hv/vgn69gtn7q54v3x9b9hldz2r3gdyms/T/spark-f56850c3-b1e4-4e1d-ade8-d3e88ecb740f 14/02/26 08:35:16 INFO HttpServer: Starting HTTP Server 14/02/26 08:35:16 INFO CoarseGrainedExecutorBackend: Got assigned task 0 14/02/26 08:35:16 INFO CoarseGrainedExecutorBackend: Got assigned task 1 2014-02-26 08:35:16.477 java[14486:b803] Unable to load realm mapping info from SCDynamicStore 14/02/26 08:35:16 INFO Executor: Running task ID 0 14/02/26 08:35:16 INFO Executor: Running task ID 1 14/02/26 08:35:16 INFO Executor: Fetching http://130.20.186.189:56223/jars/sparkr-assembly-0.1.jar with timestamp 1393432514043 14/02/26 08:35:16 INFO Utils: Fetching http://130.20.186.189:56223/jars/sparkr-assembly-0.1.jar to /var/folders/hv/vgn69gtn7q54v3x9b9hldz2r3gdyms/T/fetchFileTemp4469264870321020627.tmp 14/02/26 08:35:17 INFO Executor: Adding file:/Users/[user]/Software/spark-0.9.0-incubating-bin-hadoop1/work/app-20140226083514-0002/0/./sparkr-assembly-0.1.jar to class loader 14/02/26 08:35:17 INFO HttpBroadcast: Started reading broadcast variable 0 14/02/26 08:35:17 INFO MemoryStore: ensureFreeSpace(39208) called with curMem=0, maxMem=309225062 14/02/26 08:35:17 INFO MemoryStore: Block broadcast_0 stored as values to memory (estimated size 38.3 KB, free 294.9 MB) 14/02/26 08:35:17 INFO HttpBroadcast: Reading broadcast variable 0 took 0.149631 s 14/02/26 08:35:17 INFO BlockManager: Found block broadcast_0 locally 14/02/26 08:35:17 INFO HadoopRDD: Input split: file:/Users/[user]/Projects/SparkR-pkg/README.md:1547+1547 14/02/26 08:35:17 INFO HadoopRDD: Input split: file:/Users/[user]/Projects/SparkR-pkg/README.md:0+1547 14/02/26 08:35:17 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 14/02/26 08:35:17 WARN LoadSnappy: Snappy native library not loaded 14/02/26 08:35:19 INFO Executor: Serialized size of result for 1 is 751 14/02/26 08:35:19 INFO Executor: Serialized size of result for 0 is 751 14/02/26 08:35:19 INFO Executor: Sending result for 1 directly to driver 14/02/26 08:35:19 INFO Executor: Sending result for 0 directly to driver 14/02/26 08:35:19 INFO Executor: Finished task ID 0 14/02/26 08:35:19 INFO Executor: Finished task ID 1 14/02/26 08:35:19 INFO CoarseGrainedExecutorBackend: Got assigned task 2 14/02/26 08:35:19 INFO Executor: Running task ID 2 14/02/26 08:35:19 INFO CoarseGrainedExecutorBackend: Got assigned task 3 14/02/26 08:35:19 INFO Executor: Running task ID 3 14/02/26 08:35:19 INFO BlockManager: Found block broadcast_0 locally 14/02/26 08:35:19 INFO BlockManager: Found block broadcast_0 locally 14/02/26 08:35:20 INFO MapOutputTracker: Updating epoch to 1 and clearing cache 14/02/26 08:35:20 INFO MapOutputTracker: Don't have map outputs for shuffle 0, fetching them 14/02/26 08:35:20 INFO MapOutputTracker: Don't have map outputs for shuffle 0, fetching them 14/02/26 08:35:20 INFO MapOutputTracker: Doing the fetch; tracker actor = Actor[akka.tcp://spark@[host]:56220/user/MapOutputTracker#-260655231] 14/02/26 08:35:20 INFO MapOutputTracker: Got the output locations 14/02/26 08:35:20 INFO BlockFetcherIterator$BasicBlockFetcherIterator: Getting 2 non-zero-bytes blocks out of 2 blocks 14/02/26 08:35:20 INFO BlockFetcherIterator$BasicBlockFetcherIterator: Getting 0 non-zero-bytes blocks out of 2 blocks 14/02/26 08:35:20 INFO BlockFetcherIterator$BasicBlockFetcherIterator: Started 0 remote gets in 6 ms 14/02/26 08:35:20 INFO BlockFetcherIterator$BasicBlockFetcherIterator: Started 0 remote gets in 6 ms 14/02/26 08:35:21 INFO Executor: Serialized size of result for 3 is 807 14/02/26 08:35:21 INFO Executor: Sending result for 3 directly to driver 14/02/26 08:35:21 INFO Executor: Finished task ID 3 Error in item[[2]] : subscript out of bounds Calls: do.call ... <Anonymous> -> FUN -> FUN -> lapply -> lapply -> FUN Execution halted 14/02/26 08:35:21 ERROR Executor: Exception in task ID 2 java.lang.NullPointerException at edu.berkeley.cs.amplab.sparkr.RRDD.compute(RRDD.scala:117) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:241) at org.apache.spark.rdd.RDD.iterator(RDD.scala:232) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:109) at org.apache.spark.scheduler.Task.run(Task.scala:53) at org.apache.spark.executor.Executor$TaskRunner$$anonfun$run$1.apply$mcV$sp(Executor.scala:213) at org.apache.spark.deploy.SparkHadoopUtil.runAsUser(SparkHadoopUtil.scala:49) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:178) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:724) 14/02/26 08:35:21 INFO CoarseGrainedExecutorBackend: Got assigned task 4 14/02/26 08:35:21 INFO Executor: Running task ID 4 14/02/26 08:35:21 INFO BlockManager: Found block broadcast_0 locally 14/02/26 08:35:21 INFO BlockFetcherIterator$BasicBlockFetcherIterator: Getting 2 non-zero-bytes blocks out of 2 blocks 14/02/26 08:35:21 INFO BlockFetcherIterator$BasicBlockFetcherIterator: Started 0 remote gets in 0 ms Error in item[[2]] : subscript out of bounds Calls: do.call ... <Anonymous> -> FUN -> FUN -> lapply -> lapply -> FUN Execution halted 14/02/26 08:35:22 ERROR Executor: Exception in task ID 4 java.lang.NullPointerException at edu.berkeley.cs.amplab.sparkr.RRDD.compute(RRDD.scala:117) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:241) at org.apache.spark.rdd.RDD.iterator(RDD.scala:232) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:109) at org.apache.spark.scheduler.Task.run(Task.scala:53) at org.apache.spark.executor.Executor$TaskRunner$$anonfun$run$1.apply$mcV$sp(Executor.scala:213) at org.apache.spark.deploy.SparkHadoopUtil.runAsUser(SparkHadoopUtil.scala:49) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:178) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:724) 14/02/26 08:35:22 INFO CoarseGrainedExecutorBackend: Got assigned task 5 14/02/26 08:35:22 INFO Executor: Running task ID 5 14/02/26 08:35:22 INFO BlockManager: Found block broadcast_0 locally 14/02/26 08:35:22 INFO BlockFetcherIterator$BasicBlockFetcherIterator: Getting 2 non-zero-bytes blocks out of 2 blocks 14/02/26 08:35:22 INFO BlockFetcherIterator$BasicBlockFetcherIterator: Started 0 remote gets in 0 ms Error in item[[2]] : subscript out of bounds Calls: do.call ... <Anonymous> -> FUN -> FUN -> lapply -> lapply -> FUN Execution halted 14/02/26 08:35:23 ERROR Executor: Exception in task ID 5 java.lang.NullPointerException at edu.berkeley.cs.amplab.sparkr.RRDD.compute(RRDD.scala:117) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:241) at org.apache.spark.rdd.RDD.iterator(RDD.scala:232) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:109) at org.apache.spark.scheduler.Task.run(Task.scala:53) at org.apache.spark.executor.Executor$TaskRunner$$anonfun$run$1.apply$mcV$sp(Executor.scala:213) at org.apache.spark.deploy.SparkHadoopUtil.runAsUser(SparkHadoopUtil.scala:49) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:178) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:724) 14/02/26 08:35:23 INFO CoarseGrainedExecutorBackend: Got assigned task 6 14/02/26 08:35:23 INFO Executor: Running task ID 6 14/02/26 08:35:23 INFO BlockManager: Found block broadcast_0 locally 14/02/26 08:35:23 INFO BlockFetcherIterator$BasicBlockFetcherIterator: Getting 2 non-zero-bytes blocks out of 2 blocks 14/02/26 08:35:23 INFO BlockFetcherIterator$BasicBlockFetcherIterator: Started 0 remote gets in 1 ms Error in item[[2]] : subscript out of bounds Calls: do.call ... <Anonymous> -> FUN -> FUN -> lapply -> lapply -> FUN Execution halted 14/02/26 08:35:25 ERROR Executor: Exception in task ID 6 java.lang.NullPointerException at edu.berkeley.cs.amplab.sparkr.RRDD.compute(RRDD.scala:117) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:241) at org.apache.spark.rdd.RDD.iterator(RDD.scala:232) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:109) at org.apache.spark.scheduler.Task.run(Task.scala:53) at org.apache.spark.executor.Executor$TaskRunner$$anonfun$run$1.apply$mcV$sp(Executor.scala:213) at org.apache.spark.deploy.SparkHadoopUtil.runAsUser(SparkHadoopUtil.scala:49) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:178) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:724) {code} h4. Relevant environment variables #### {code} SPARK_HOME=/Users/[user]/Software/spark-0.9.0-incubating-bin-hadoop1/ SPARK_LOCAL_IP=<ip address> {code} Thanks for the help edit: updated the format

    JIRA | 3 years ago | rperko3
    java.lang.NullPointerException
  3. 0

    How to build logistic regression model in SparkR

    Stack Overflow | 2 years ago
    java.lang.NullPointerException
  4. Speed up your debug routine!

    Automated exception search integrated into your IDE

  5. 0

    Minimal example: {noformat} rdd <- textFile(sc, "./README.md") lengths <- lapply(rdd, function(x) { length(x) }) take(lengths, 5) # works lengths10 <- lapply(lengths, function(x) { x + 10}) take(lengths10, 2) # breaks {noformat} Stacktrace: {noformat} Exception in thread "stdin writer for R" java.lang.ClassCastException: java.lang.String cannot be cast to [B at edu.berkeley.cs.amplab.sparkr.RRDD$$anon$4$$anonfun$run$3.apply(RRDD.scala:312) at edu.berkeley.cs.amplab.sparkr.RRDD$$anon$4$$anonfun$run$3.apply(RRDD.scala:310) at scala.collection.Iterator$class.foreach(Iterator.scala:727) at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) at edu.berkeley.cs.amplab.sparkr.RRDD$$anon$4.run(RRDD.scala:310) Error in readBin(con, raw(), as.integer(dataLen), endian = "big") : invalid 'n' argument Calls: unserialize -> readRawLen -> readBin Execution halted 14/11/17 12:22:31 ERROR Executor: Exception in task 0.0 in stage 1.0 (TID 1) java.lang.NullPointerException at edu.berkeley.cs.amplab.sparkr.RRDD.compute(RRDD.scala:128) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) at org.apache.spark.rdd.RDD.iterator(RDD.scala:229) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62) at org.apache.spark.scheduler.Task.run(Task.scala:54) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918) at java.lang.Thread.run(Thread.java:695) 14/11/17 12:22:31 WARN TaskSetManager: Lost task 0.0 in stage 1.0 (TID 1, localhost): java.lang.NullPointerException: edu.berkeley.cs.amplab.sparkr.RRDD.compute(RRDD.scala:128) org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) org.apache.spark.rdd.RDD.iterator(RDD.scala:229) org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62) org.apache.spark.scheduler.Task.run(Task.scala:54) org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177) java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895) java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918) java.lang.Thread.run(Thread.java:695) 14/11/17 12:22:31 ERROR TaskSetManager: Task 0 in stage 1.0 failed 1 times; aborting job Error in .jcall(jrdd, "[Ljava/util/List;", "collectPartitions", .jarray(as.integer(index))) : org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 1.0 failed 1 times, most recent failure: Lost task 0.0 in stage 1.0 (TID 1, localhost): java.lang.NullPointerException: edu.berkeley.cs.amplab.sparkr.RRDD.compute(RRDD.scala:128) org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) org.apache.spark.rdd.RDD.iterator(RDD.scala:229) org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62) org.apache.spark.scheduler.Task.run(Task.scala:54) org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177) java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895) java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918) java.lang.Thread.run(Thread.java:695) {noformat} This is likely related to [this line|https://github.com/amplab-extras/SparkR-pkg/blob/master/pkg/R/RDD.R#L122], changing which to FALSE seems to eliminate the issue. One workaround is to cache the `lengths` RDD first. We should figure out what exactly the issue is & perhaps in the mean time add some more doc in the code on how pipelining works (e.g. state invariants on some key variables).

    JIRA | 2 years ago | Zongheng Yang
    java.lang.NullPointerException
  6. 0

    Minimal example: {noformat} rdd <- textFile(sc, "./README.md") lengths <- lapply(rdd, function(x) { length(x) }) take(lengths, 5) # works lengths10 <- lapply(lengths, function(x) { x + 10}) take(lengths10, 2) # breaks {noformat} Stacktrace: {noformat} Exception in thread "stdin writer for R" java.lang.ClassCastException: java.lang.String cannot be cast to [B at edu.berkeley.cs.amplab.sparkr.RRDD$$anon$4$$anonfun$run$3.apply(RRDD.scala:312) at edu.berkeley.cs.amplab.sparkr.RRDD$$anon$4$$anonfun$run$3.apply(RRDD.scala:310) at scala.collection.Iterator$class.foreach(Iterator.scala:727) at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) at edu.berkeley.cs.amplab.sparkr.RRDD$$anon$4.run(RRDD.scala:310) Error in readBin(con, raw(), as.integer(dataLen), endian = "big") : invalid 'n' argument Calls: unserialize -> readRawLen -> readBin Execution halted 14/11/17 12:22:31 ERROR Executor: Exception in task 0.0 in stage 1.0 (TID 1) java.lang.NullPointerException at edu.berkeley.cs.amplab.sparkr.RRDD.compute(RRDD.scala:128) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) at org.apache.spark.rdd.RDD.iterator(RDD.scala:229) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62) at org.apache.spark.scheduler.Task.run(Task.scala:54) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918) at java.lang.Thread.run(Thread.java:695) 14/11/17 12:22:31 WARN TaskSetManager: Lost task 0.0 in stage 1.0 (TID 1, localhost): java.lang.NullPointerException: edu.berkeley.cs.amplab.sparkr.RRDD.compute(RRDD.scala:128) org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) org.apache.spark.rdd.RDD.iterator(RDD.scala:229) org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62) org.apache.spark.scheduler.Task.run(Task.scala:54) org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177) java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895) java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918) java.lang.Thread.run(Thread.java:695) 14/11/17 12:22:31 ERROR TaskSetManager: Task 0 in stage 1.0 failed 1 times; aborting job Error in .jcall(jrdd, "[Ljava/util/List;", "collectPartitions", .jarray(as.integer(index))) : org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 1.0 failed 1 times, most recent failure: Lost task 0.0 in stage 1.0 (TID 1, localhost): java.lang.NullPointerException: edu.berkeley.cs.amplab.sparkr.RRDD.compute(RRDD.scala:128) org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) org.apache.spark.rdd.RDD.iterator(RDD.scala:229) org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62) org.apache.spark.scheduler.Task.run(Task.scala:54) org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177) java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895) java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918) java.lang.Thread.run(Thread.java:695) {noformat} This is likely related to [this line|https://github.com/amplab-extras/SparkR-pkg/blob/master/pkg/R/RDD.R#L122], changing which to FALSE seems to eliminate the issue. One workaround is to cache the `lengths` RDD first. We should figure out what exactly the issue is & perhaps in the mean time add some more doc in the code on how pipelining works (e.g. state invariants on some key variables).

    JIRA | 2 years ago | Zongheng Yang
    java.lang.NullPointerException

    Root Cause Analysis

    1. java.lang.NullPointerException

      No message provided

      at edu.berkeley.cs.amplab.sparkr.RRDD.compute()
    2. edu.berkeley.cs
      RRDD.compute
      1. edu.berkeley.cs.amplab.sparkr.RRDD.compute(RRDD.scala:117)
      1 frame
    3. Spark
      Executor$TaskRunner.run
      1. org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:241)
      2. org.apache.spark.rdd.RDD.iterator(RDD.scala:232)
      3. org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:109)
      4. org.apache.spark.scheduler.Task.run(Task.scala:53)
      5. org.apache.spark.executor.Executor$TaskRunner$$anonfun$run$1.apply$mcV$sp(Executor.scala:213)
      6. org.apache.spark.deploy.SparkHadoopUtil.runAsUser(SparkHadoopUtil.scala:49)
      7. org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:178)
      7 frames
    4. Java RT
      Thread.run
      1. java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
      2. java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
      3. java.lang.Thread.run(Thread.java:724)
      3 frames