org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 1.0 failed 4 times, most recent failure: Lost task 0.3 in stage 1.0 (TID 8, 162.44.115.223): java.io.IOException: FailedToCheckpointException(message:Failed to rename /tmp/tmp/tachyon/workers/1448540000001/7/31 to /tmp/tmp/tachyon/data/31)

JIRA | Sudipta | 1 year ago
  1. 0

    [TACHYON-1352] java.io.IOException: FailedToCheckpointException(message:Failed to rename /tmp/tmp/tachyon/workers/1448540000001/7/31 to /tmp/tmp/tachyon/data/31) - JIRA

    atlassian.net | 11 months ago
    org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 1.0 failed 4 times, most recent failure: Lost task 0.3 in stage 1.0 (TID 8, 162.44.115.223): java.io.IOException: FailedToCheckpointException(message:Failed to rename /tmp/tmp/tachyon/workers/1448540000001/7/31 to /tmp/tmp/tachyon/data/31)
  2. 0

    Spark client 1.5.1 is trying to save the file at remote (non HDFS) location Tachyon server running in local mode. 1. We are able to complete the runTests successfully. 2. Spark (StorageLevel.OFF_HEAP) is also working properly. 3. During rdd.saveAsTextFile("tachyon://*.*.*.*:19998/test_dev1") we are getting below error. 4. Log information is available below: scala> rdd.saveAsTextFile("tachyon://*.*.*.*:19998/test_dev") org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 1.0 failed 4 times, most recent failure: Lost task 0.3 in stage 1.0 (TID 8, 162.44.115.223): java.io.IOException: FailedToCheckpointException(message:Failed to rename /tmp/tmp/tachyon/workers/1448540000001/7/31 to /tmp/tmp/tachyon/data/31) at tachyon.worker.WorkerClient.addCheckpoint(WorkerClient.java:130) at tachyon.client.TachyonFS.addCheckpoint(TachyonFS.java:228) at tachyon.client.FileOutStream.close(FileOutStream.java:105) at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:70) at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:103) at org.apache.hadoop.mapred.TextOutputFormat$LineRecordWriter.close(TextOutputFormat.java:108) at org.apache.spark.SparkHadoopWriter.close(SparkHadoopWriter.scala:103) at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13$$anonfun$apply$7.apply$mcV$sp(PairRDDFunctions.scala:1117) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1215) at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13.apply(PairRDDFunctions.scala:1116) at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13.apply(PairRDDFunctions.scala:1095) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) at org.apache.spark.scheduler.Task.run(Task.scala:88) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: FailedToCheckpointException(message:Failed to rename /tmp/tmp/tachyon/workers/1448540000001/7/31 to /tmp/tmp/tachyon/data/31) at tachyon.thrift.WorkerService$addCheckpoint_result$addCheckpoint_resultStandardScheme.read(WorkerService.java:3509) at tachyon.thrift.WorkerService$addCheckpoint_result$addCheckpoint_resultStandardScheme.read(WorkerService.java:3477) at tachyon.thrift.WorkerService$addCheckpoint_result.read(WorkerService.java:3403) at tachyon.org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:78) at tachyon.thrift.WorkerService$Client.recv_addCheckpoint(WorkerService.java:221) at tachyon.thrift.WorkerService$Client.addCheckpoint(WorkerService.java:207) at tachyon.worker.WorkerClient.addCheckpoint(WorkerClient.java:124) ... 16 more Driver stacktrace: at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1283) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1271) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1270) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1270) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:697) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:697) at scala.Option.foreach(Option.scala:236) at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:697) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1496) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1458) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1447) at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48) at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:567) at org.apache.spark.SparkContext.runJob(SparkContext.scala:1822) at org.apache.spark.SparkContext.runJob(SparkContext.scala:1835) at org.apache.spark.SparkContext.runJob(SparkContext.scala:1912) at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1.apply$mcV$sp(PairRDDFunctions.scala:1124) at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1.apply(PairRDDFunctions.scala:1065) at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1.apply(PairRDDFunctions.scala:1065) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:147) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:108) at org.apache.spark.rdd.RDD.withScope(RDD.scala:306) at org.apache.spark.rdd.PairRDDFunctions.saveAsHadoopDataset(PairRDDFunctions.scala:1065) at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopFile$4.apply$mcV$sp(PairRDDFunctions.scala:989) at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopFile$4.apply(PairRDDFunctions.scala:965) at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopFile$4.apply(PairRDDFunctions.scala:965) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:147) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:108) at org.apache.spark.rdd.RDD.withScope(RDD.scala:306) at org.apache.spark.rdd.PairRDDFunctions.saveAsHadoopFile(PairRDDFunctions.scala:965) at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopFile$1.apply$mcV$sp(PairRDDFunctions.scala:897) at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopFile$1.apply(PairRDDFunctions.scala:897) at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopFile$1.apply(PairRDDFunctions.scala:897) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:147) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:108) at org.apache.spark.rdd.RDD.withScope(RDD.scala:306) at org.apache.spark.rdd.PairRDDFunctions.saveAsHadoopFile(PairRDDFunctions.scala:896) at org.apache.spark.rdd.RDD$$anonfun$saveAsTextFile$1.apply$mcV$sp(RDD.scala:1426) at org.apache.spark.rdd.RDD$$anonfun$saveAsTextFile$1.apply(RDD.scala:1405) at org.apache.spark.rdd.RDD$$anonfun$saveAsTextFile$1.apply(RDD.scala:1405) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:147) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:108) at org.apache.spark.rdd.RDD.withScope(RDD.scala:306) at org.apache.spark.rdd.RDD.saveAsTextFile(RDD.scala:1405) at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:25) at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:30) at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:32) at $iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:34) at $iwC$$iwC$$iwC$$iwC.<init>(<console>:36) at $iwC$$iwC$$iwC.<init>(<console>:38) at $iwC$$iwC.<init>(<console>:40) at $iwC.<init>(<console>:42) at <init>(<console>:44) at .<init>(<console>:48) at .<clinit>(<console>) at .<init>(<console>:7) at .<clinit>(<console>) at $print(<console>) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065) at org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1340) at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:840) at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:871) at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:819) at org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:857) at org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:902) at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:814) at org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:657) at org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:665) at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$loop(SparkILoop.scala:670) at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:997) at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945) at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945) at scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135) at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$process(SparkILoop.scala:945) at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1059) at org.apache.spark.repl.Main$.main(Main.scala:31) at org.apache.spark.repl.Main.main(Main.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:672) at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180) at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:120) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) Caused by: java.io.IOException: FailedToCheckpointException(message:Failed to rename /tmp/tmp/tachyon/workers/1448540000001/7/31 to /tmp/tmp/tachyon/data/31) at tachyon.worker.WorkerClient.addCheckpoint(WorkerClient.java:130) at tachyon.client.TachyonFS.addCheckpoint(TachyonFS.java:228) at tachyon.client.FileOutStream.close(FileOutStream.java:105) at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:70) at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:103) at org.apache.hadoop.mapred.TextOutputFormat$LineRecordWriter.close(TextOutputFormat.java:108) at org.apache.spark.SparkHadoopWriter.close(SparkHadoopWriter.scala:103) at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13$$anonfun$apply$7.apply$mcV$sp(PairRDDFunctions.scala:1117) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1215) at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13.apply(PairRDDFunctions.scala:1116) at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13.apply(PairRDDFunctions.scala:1095) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) at org.apache.spark.scheduler.Task.run(Task.scala:88) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: FailedToCheckpointException(message:Failed to rename /tmp/tmp/tachyon/workers/1448540000001/7/31 to /tmp/tmp/tachyon/data/31) at tachyon.thrift.WorkerService$addCheckpoint_result$addCheckpoint_resultStandardScheme.read(WorkerService.java:3509) at tachyon.thrift.WorkerService$addCheckpoint_result$addCheckpoint_resultStandardScheme.read(WorkerService.java:3477) at tachyon.thrift.WorkerService$addCheckpoint_result.read(WorkerService.java:3403) at tachyon.org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:78) at tachyon.thrift.WorkerService$Client.recv_addCheckpoint(WorkerService.java:221) at tachyon.thrift.WorkerService$Client.addCheckpoint(WorkerService.java:207) at tachyon.worker.WorkerClient.addCheckpoint(WorkerClient.java:124)

    JIRA | 1 year ago | Sudipta
    org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 1.0 failed 4 times, most recent failure: Lost task 0.3 in stage 1.0 (TID 8, 162.44.115.223): java.io.IOException: FailedToCheckpointException(message:Failed to rename /tmp/tmp/tachyon/workers/1448540000001/7/31 to /tmp/tmp/tachyon/data/31)
  3. 0

    Spark client 1.5.1 is trying to save the file at remote (non HDFS) location Tachyon server running in local mode. 1. We are able to complete the runTests successfully. 2. Spark (StorageLevel.OFF_HEAP) is also working properly. 3. During rdd.saveAsTextFile("tachyon://*.*.*.*:19998/test_dev1") we are getting below error. 4. Log information is available below: scala> rdd.saveAsTextFile("tachyon://*.*.*.*:19998/test_dev") org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 1.0 failed 4 times, most recent failure: Lost task 0.3 in stage 1.0 (TID 8, 162.44.115.223): java.io.IOException: FailedToCheckpointException(message:Failed to rename /tmp/tmp/tachyon/workers/1448540000001/7/31 to /tmp/tmp/tachyon/data/31) at tachyon.worker.WorkerClient.addCheckpoint(WorkerClient.java:130) at tachyon.client.TachyonFS.addCheckpoint(TachyonFS.java:228) at tachyon.client.FileOutStream.close(FileOutStream.java:105) at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:70) at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:103) at org.apache.hadoop.mapred.TextOutputFormat$LineRecordWriter.close(TextOutputFormat.java:108) at org.apache.spark.SparkHadoopWriter.close(SparkHadoopWriter.scala:103) at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13$$anonfun$apply$7.apply$mcV$sp(PairRDDFunctions.scala:1117) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1215) at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13.apply(PairRDDFunctions.scala:1116) at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13.apply(PairRDDFunctions.scala:1095) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) at org.apache.spark.scheduler.Task.run(Task.scala:88) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: FailedToCheckpointException(message:Failed to rename /tmp/tmp/tachyon/workers/1448540000001/7/31 to /tmp/tmp/tachyon/data/31) at tachyon.thrift.WorkerService$addCheckpoint_result$addCheckpoint_resultStandardScheme.read(WorkerService.java:3509) at tachyon.thrift.WorkerService$addCheckpoint_result$addCheckpoint_resultStandardScheme.read(WorkerService.java:3477) at tachyon.thrift.WorkerService$addCheckpoint_result.read(WorkerService.java:3403) at tachyon.org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:78) at tachyon.thrift.WorkerService$Client.recv_addCheckpoint(WorkerService.java:221) at tachyon.thrift.WorkerService$Client.addCheckpoint(WorkerService.java:207) at tachyon.worker.WorkerClient.addCheckpoint(WorkerClient.java:124) ... 16 more Driver stacktrace: at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1283) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1271) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1270) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1270) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:697) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:697) at scala.Option.foreach(Option.scala:236) at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:697) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1496) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1458) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1447) at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48) at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:567) at org.apache.spark.SparkContext.runJob(SparkContext.scala:1822) at org.apache.spark.SparkContext.runJob(SparkContext.scala:1835) at org.apache.spark.SparkContext.runJob(SparkContext.scala:1912) at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1.apply$mcV$sp(PairRDDFunctions.scala:1124) at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1.apply(PairRDDFunctions.scala:1065) at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1.apply(PairRDDFunctions.scala:1065) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:147) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:108) at org.apache.spark.rdd.RDD.withScope(RDD.scala:306) at org.apache.spark.rdd.PairRDDFunctions.saveAsHadoopDataset(PairRDDFunctions.scala:1065) at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopFile$4.apply$mcV$sp(PairRDDFunctions.scala:989) at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopFile$4.apply(PairRDDFunctions.scala:965) at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopFile$4.apply(PairRDDFunctions.scala:965) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:147) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:108) at org.apache.spark.rdd.RDD.withScope(RDD.scala:306) at org.apache.spark.rdd.PairRDDFunctions.saveAsHadoopFile(PairRDDFunctions.scala:965) at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopFile$1.apply$mcV$sp(PairRDDFunctions.scala:897) at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopFile$1.apply(PairRDDFunctions.scala:897) at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopFile$1.apply(PairRDDFunctions.scala:897) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:147) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:108) at org.apache.spark.rdd.RDD.withScope(RDD.scala:306) at org.apache.spark.rdd.PairRDDFunctions.saveAsHadoopFile(PairRDDFunctions.scala:896) at org.apache.spark.rdd.RDD$$anonfun$saveAsTextFile$1.apply$mcV$sp(RDD.scala:1426) at org.apache.spark.rdd.RDD$$anonfun$saveAsTextFile$1.apply(RDD.scala:1405) at org.apache.spark.rdd.RDD$$anonfun$saveAsTextFile$1.apply(RDD.scala:1405) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:147) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:108) at org.apache.spark.rdd.RDD.withScope(RDD.scala:306) at org.apache.spark.rdd.RDD.saveAsTextFile(RDD.scala:1405) at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:25) at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:30) at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:32) at $iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:34) at $iwC$$iwC$$iwC$$iwC.<init>(<console>:36) at $iwC$$iwC$$iwC.<init>(<console>:38) at $iwC$$iwC.<init>(<console>:40) at $iwC.<init>(<console>:42) at <init>(<console>:44) at .<init>(<console>:48) at .<clinit>(<console>) at .<init>(<console>:7) at .<clinit>(<console>) at $print(<console>) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065) at org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1340) at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:840) at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:871) at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:819) at org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:857) at org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:902) at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:814) at org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:657) at org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:665) at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$loop(SparkILoop.scala:670) at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:997) at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945) at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945) at scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135) at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$process(SparkILoop.scala:945) at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1059) at org.apache.spark.repl.Main$.main(Main.scala:31) at org.apache.spark.repl.Main.main(Main.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:672) at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180) at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:120) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) Caused by: java.io.IOException: FailedToCheckpointException(message:Failed to rename /tmp/tmp/tachyon/workers/1448540000001/7/31 to /tmp/tmp/tachyon/data/31) at tachyon.worker.WorkerClient.addCheckpoint(WorkerClient.java:130) at tachyon.client.TachyonFS.addCheckpoint(TachyonFS.java:228) at tachyon.client.FileOutStream.close(FileOutStream.java:105) at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:70) at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:103) at org.apache.hadoop.mapred.TextOutputFormat$LineRecordWriter.close(TextOutputFormat.java:108) at org.apache.spark.SparkHadoopWriter.close(SparkHadoopWriter.scala:103) at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13$$anonfun$apply$7.apply$mcV$sp(PairRDDFunctions.scala:1117) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1215) at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13.apply(PairRDDFunctions.scala:1116) at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13.apply(PairRDDFunctions.scala:1095) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) at org.apache.spark.scheduler.Task.run(Task.scala:88) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: FailedToCheckpointException(message:Failed to rename /tmp/tmp/tachyon/workers/1448540000001/7/31 to /tmp/tmp/tachyon/data/31) at tachyon.thrift.WorkerService$addCheckpoint_result$addCheckpoint_resultStandardScheme.read(WorkerService.java:3509) at tachyon.thrift.WorkerService$addCheckpoint_result$addCheckpoint_resultStandardScheme.read(WorkerService.java:3477) at tachyon.thrift.WorkerService$addCheckpoint_result.read(WorkerService.java:3403) at tachyon.org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:78) at tachyon.thrift.WorkerService$Client.recv_addCheckpoint(WorkerService.java:221) at tachyon.thrift.WorkerService$Client.addCheckpoint(WorkerService.java:207) at tachyon.worker.WorkerClient.addCheckpoint(WorkerClient.java:124)

    JIRA | 1 year ago | Sudipta
    org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 1.0 failed 4 times, most recent failure: Lost task 0.3 in stage 1.0 (TID 8, 162.44.115.223): java.io.IOException: FailedToCheckpointException(message:Failed to rename /tmp/tmp/tachyon/workers/1448540000001/7/31 to /tmp/tmp/tachyon/data/31)
  4. Speed up your debug routine!

    Automated exception search integrated into your IDE

  5. 0

    Exceptions: The machine does not have any local worker (when running Spark on top of Tachyon-0.6.4)

    Google Groups | 2 years ago | Tianyin Xu
    org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 1.0 failed 1 times, most recent failure: Lost task 1.0 in stage 1.0 (TID 3, localhost): java.io.IOException: FailedToCheckpointException(message:Failed to rename hdfs://xxx.xxx.xxx.xxx:9000/tmp/tachyon/workers/1432692000002/4/17 to hdfs://xxx.xxx.xxx.xxx:9000/tmp/tachyon/data/17)
  6. 0

    GitHub comment 67#230696197

    GitHub | 5 months ago | guchensmile
    org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 1.0 failed 4 times, most recent failure: Lost task 1.3 in stage 1.0 (TID 7, heracles): java.lang.NullPointerException

    Not finding the right solution?
    Take a tour to get the most out of Samebug.

    Tired of useless tips?

    Automated exception search integrated into your IDE

    Root Cause Analysis

    1. org.apache.spark.SparkException

      Job aborted due to stage failure: Task 0 in stage 1.0 failed 4 times, most recent failure: Lost task 0.3 in stage 1.0 (TID 8, 162.44.115.223): java.io.IOException: FailedToCheckpointException(message:Failed to rename /tmp/tmp/tachyon/workers/1448540000001/7/31 to /tmp/tmp/tachyon/data/31)

      at tachyon.worker.WorkerClient.addCheckpoint()
    2. Tachyon Project Core
      FileOutStream.close
      1. tachyon.worker.WorkerClient.addCheckpoint(WorkerClient.java:130)
      2. tachyon.client.TachyonFS.addCheckpoint(TachyonFS.java:228)
      3. tachyon.client.FileOutStream.close(FileOutStream.java:105)
      3 frames
    3. Hadoop
      FSDataOutputStream.close
      1. org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:70)
      2. org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:103)
      2 frames
    4. Hadoop
      TextOutputFormat$LineRecordWriter.close
      1. org.apache.hadoop.mapred.TextOutputFormat$LineRecordWriter.close(TextOutputFormat.java:108)
      1 frame
    5. Spark
      Executor$TaskRunner.run
      1. org.apache.spark.SparkHadoopWriter.close(SparkHadoopWriter.scala:103)
      2. org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13$$anonfun$apply$7.apply$mcV$sp(PairRDDFunctions.scala:1117)
      3. org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1215)
      4. org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13.apply(PairRDDFunctions.scala:1116)
      5. org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13.apply(PairRDDFunctions.scala:1095)
      6. org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
      7. org.apache.spark.scheduler.Task.run(Task.scala:88)
      8. org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
      8 frames
    6. Java RT
      Thread.run
      1. java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
      2. java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
      3. java.lang.Thread.run(Thread.java:745)
      3 frames