org.apache.spark.SparkException: Job aborted.

Apache's JIRA Issue Tracker | Paulo Motta | 2 years ago
  1. 0

    I'm trying to copy a large amount of files from HDFS to S3 via distcp and I'm getting the following exception: {code:java} 2015-01-16 20:53:18,187 ERROR [main] org.apache.hadoop.tools.mapred.CopyMapper: Failure in copying hdfs://10.165.35.216/hdfsFolder/file.gz to s3n://s3-bucket/file.gz java.io.FileNotFoundException: No such file or directory 's3n://s3-bucket/file.gz' at org.apache.hadoop.fs.s3native.NativeS3FileSystem.getFileStatus(NativeS3FileSystem.java:445) at org.apache.hadoop.tools.util.DistCpUtils.preserve(DistCpUtils.java:187) at org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:233) at org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:45) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162) 2015-01-16 20:53:18,276 WARN [main] org.apache.hadoop.mapred.YarnChild: Exception running child : java.io.FileNotFoundException: No such file or directory 's3n://s3-bucket/file.gz' at org.apache.hadoop.fs.s3native.NativeS3FileSystem.getFileStatus(NativeS3FileSystem.java:445) at org.apache.hadoop.tools.util.DistCpUtils.preserve(DistCpUtils.java:187) at org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:233) at org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:45) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162) {code} However, when I try hadoop fs -ls s3n://s3-bucket/file.gz the file is there. So probably due to Amazon's S3 eventual consistency the job failure. In my opinion, in order to fix this problem NativeS3FileSystem.getFileStatus must use fs.s3.maxRetries property in order to avoid failures like this.

    Apache's JIRA Issue Tracker | 2 years ago | Paulo Motta
    org.apache.spark.SparkException: Job aborted.
  2. 0

    mongo-hadoop and hadoop-0.23.1

    Google Groups | 5 years ago | Mark Lewandowski
    java.io.FileNotFoundException: File file:/tmp/_temporary/0 does not exist
  3. 0

    [mongodb-user] mongo-hadoop and hadoop-0.23.1

    Google Groups | 5 years ago | Mark Lewandowski
    java.io.FileNotFoundException: File file:/tmp/_temporary/0 does not exist
  4. Speed up your debug routine!

    Automated exception search integrated into your IDE

  5. 0

    FileNotFoundException when attempting to write an Amazon s3 location, file is .gz even with no codec specified

    GitHub | 9 months ago | shriyaarora
    java.io.FileNotFoundException: File s3://baseDirecotry/path.csv/_temporary/0 does not exist.
  6. 0

    FileNotFoundException on saveAsNewHadoopFile

    Stack Overflow | 10 months ago | swinefish
    java.io.FileNotFoundException: File file:/tmp/hfiles-06-46-57/_temporary/0/_temporary/attempt_201602150647_0019_r_000000_25/f1 does not exist

    1 unregistered visitors
    Not finding the right solution?
    Take a tour to get the most out of Samebug.

    Tired of useless tips?

    Automated exception search integrated into your IDE

    Root Cause Analysis

    1. java.io.FileNotFoundException

      File s3n://foo-hive/warehouse/fooabcxyz0719/_temporary/0/task_201607210010_0005_m_000041 does not exist.

      at org.apache.hadoop.fs.s3native.NativeS3FileSystem.listStatus()
    2. Hadoop
      NativeS3FileSystem.listStatus
      1. org.apache.hadoop.fs.s3native.NativeS3FileSystem.listStatus(NativeS3FileSystem.java:506)
      1 frame
    3. Hadoop
      FileOutputCommitter.commitJob
      1. org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.mergePaths(FileOutputCommitter.java:360)
      2. org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.commitJob(FileOutputCommitter.java:310)
      2 frames
    4. org.apache.parquet
      ParquetOutputCommitter.commitJob
      1. org.apache.parquet.hadoop.ParquetOutputCommitter.commitJob(ParquetOutputCommitter.java:46)
      1 frame
    5. org.apache.spark
      InsertIntoHadoopFsRelation$$anonfun$run$1.apply
      1. org.apache.spark.sql.execution.datasources.BaseWriterContainer.commitJob(WriterContainer.scala:230)
      2. org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelation$$anonfun$run$1.apply$mcV$sp(InsertIntoHadoopFsRelation.scala:149)
      3. org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelation$$anonfun$run$1.apply(InsertIntoHadoopFsRelation.scala:106)
      4. org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelation$$anonfun$run$1.apply(InsertIntoHadoopFsRelation.scala:106)
      4 frames
    6. Spark Project SQL
      SQLExecution$.withNewExecutionId
      1. org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:56)
      1 frame
    7. org.apache.spark
      InsertIntoHadoopFsRelation.run
      1. org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelation.run(InsertIntoHadoopFsRelation.scala:106)
      1 frame
    8. Spark Project SQL
      SparkPlan$$anonfun$execute$5.apply
      1. org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult$lzycompute(commands.scala:58)
      2. org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult(commands.scala:56)
      3. org.apache.spark.sql.execution.ExecutedCommand.doExecute(commands.scala:70)
      4. org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:132)
      5. org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:130)
      5 frames
    9. Spark
      RDDOperationScope$.withScope
      1. org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
      1 frame
    10. Spark Project SQL
      DataFrameWriter.saveAsTable
      1. org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:130)
      2. org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:55)
      3. org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:55)
      4. org.apache.spark.sql.DataFrameWriter.insertInto(DataFrameWriter.scala:189)
      5. org.apache.spark.sql.DataFrameWriter.saveAsTable(DataFrameWriter.scala:239)
      6. org.apache.spark.sql.DataFrameWriter.saveAsTable(DataFrameWriter.scala:221)
      6 frames
    11. com.foo.vAnalytics
      xyz_load.main
      1. com.foo.vAnalytics.xyz_load$.main(xyz_load.scala:130)
      2. com.foo.vAnalytics.xyz_load.main(xyz_load.scala)
      2 frames
    12. Java RT
      Method.invoke
      1. sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      2. sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
      3. sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
      4. java.lang.reflect.Method.invoke(Method.java:606)
      4 frames
    13. Spark
      SparkSubmit.main
      1. org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
      2. org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
      3. org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
      4. org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
      5. org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
      5 frames
    14. org.apache.oozie
      SparkMain.main
      1. org.apache.oozie.action.hadoop.SparkMain.runSpark(SparkMain.java:104)
      2. org.apache.oozie.action.hadoop.SparkMain.run(SparkMain.java:95)
      3. org.apache.oozie.action.hadoop.LauncherMain.run(LauncherMain.java:47)
      4. org.apache.oozie.action.hadoop.SparkMain.main(SparkMain.java:38)
      4 frames
    15. Java RT
      Method.invoke
      1. sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      2. sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
      3. sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
      4. java.lang.reflect.Method.invoke(Method.java:606)
      4 frames
    16. org.apache.oozie
      LauncherMapper.map
      1. org.apache.oozie.action.hadoop.LauncherMapper.map(LauncherMapper.java:236)
      1 frame
    17. Hadoop
      LocalContainerLauncher$SubtaskRunner.run
      1. org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
      2. org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:430)
      3. org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
      4. org.apache.hadoop.mapred.LocalContainerLauncher$SubtaskRunner.runSubtask(LocalContainerLauncher.java:317)
      5. org.apache.hadoop.mapred.LocalContainerLauncher$SubtaskRunner.run(LocalContainerLauncher.java:232)
      5 frames
    18. Java RT
      Thread.run
      1. java.lang.Thread.run(Thread.java:745)
      1 frame