java.io.FileNotFoundException: File does not exist: /test/ABC/ABC

JIRA | Calvin Jia | 2 years ago
  1. 0

    When running a spark job on a file in the under file system but not in Tachyon, the user will get a file not found exception until the file has been loaded into Tachyon. It seems like the underfs path is incorrect due to the Tachyon path placing the file into a folder with the same name. To reproduce: Run Tachyon on top of HDFS. Place a file into HDFS but do not sync Tachyon. Run Spark with Tachyon client version 0.6. Run a spark job (ie. count) on the file in HDFS. {code:java} java.io.FileNotFoundException: File does not exist: /test/ABC/ABC at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.openInfo(DFSClient.java:1843) at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.<init>(DFSClient.java:1834) at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:578) at org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:154) at tachyon.hadoop.HdfsFileInputStream.getHdfsInputStream(HdfsFileInputStream.java:101) at tachyon.hadoop.HdfsFileInputStream.seek(HdfsFileInputStream.java:246) at org.apache.hadoop.fs.FSDataInputStream.seek(FSDataInputStream.java:37) at org.apache.hadoop.mapred.LineRecordReader.<init>(LineRecordReader.java:87) at org.apache.hadoop.mapred.TextInputFormat.getRecordReader(TextInputFormat.java:51) at org.apache.spark.rdd.HadoopRDD$$anon$1.<init>(HadoopRDD.scala:236) at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:212) at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:101) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61) at org.apache.spark.scheduler.Task.run(Task.scala:64) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918) at java.lang.Thread.run(Thread.java:695) {code}

    JIRA | 2 years ago | Calvin Jia
    java.io.FileNotFoundException: File does not exist: /test/ABC/ABC
  2. 0

    When running a spark job on a file in the under file system but not in Tachyon, the user will get a file not found exception until the file has been loaded into Tachyon. It seems like the underfs path is incorrect due to the Tachyon path placing the file into a folder with the same name. To reproduce: Run Tachyon on top of HDFS. Place a file into HDFS but do not sync Tachyon. Run Spark with Tachyon client version 0.6. Run a spark job (ie. count) on the file in HDFS. {code:java} java.io.FileNotFoundException: File does not exist: /test/ABC/ABC at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.openInfo(DFSClient.java:1843) at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.<init>(DFSClient.java:1834) at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:578) at org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:154) at tachyon.hadoop.HdfsFileInputStream.getHdfsInputStream(HdfsFileInputStream.java:101) at tachyon.hadoop.HdfsFileInputStream.seek(HdfsFileInputStream.java:246) at org.apache.hadoop.fs.FSDataInputStream.seek(FSDataInputStream.java:37) at org.apache.hadoop.mapred.LineRecordReader.<init>(LineRecordReader.java:87) at org.apache.hadoop.mapred.TextInputFormat.getRecordReader(TextInputFormat.java:51) at org.apache.spark.rdd.HadoopRDD$$anon$1.<init>(HadoopRDD.scala:236) at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:212) at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:101) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61) at org.apache.spark.scheduler.Task.run(Task.scala:64) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918) at java.lang.Thread.run(Thread.java:695) {code}

    JIRA | 2 years ago | Calvin Jia
    java.io.FileNotFoundException: File does not exist: /test/ABC/ABC
  3. 0

    FileNotFoundException on hadoop

    Stack Overflow | 4 years ago | stholy
    java.io.FileNotFoundException: File does not exist: /app/hadoop/jobs/nw_single_pred_in/predict
  4. Speed up your debug routine!

    Automated exception search integrated into your IDE

  5. 0

    Never ending transtionning regions.

    Google Groups | 4 years ago | Jean-Marc Spaggiari
    java.io.IOException: java.io.IOException: java.io.FileNotFoundException: File does not exist: /hbase/entry/2ebfef593a3d715b59b85670909182c9/a/62b0aae45d59408dbcfc513954efabc7
  6. 0

    Re: Spark mode file not found

    incubator-mrql-user | 2 years ago | Etienne Dumoulin
    java.io.FileNotFoundException: File does not exist: /tmp/hadoop_data_source_dir.txt*

    1 unregistered visitors
    Not finding the right solution?
    Take a tour to get the most out of Samebug.

    Tired of useless tips?

    Automated exception search integrated into your IDE

    Root Cause Analysis

    1. java.io.FileNotFoundException

      File does not exist: /test/ABC/ABC

      at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.openInfo()
    2. Apache Hadoop HDFS
      DistributedFileSystem.open
      1. org.apache.hadoop.hdfs.DFSClient$DFSInputStream.openInfo(DFSClient.java:1843)
      2. org.apache.hadoop.hdfs.DFSClient$DFSInputStream.<init>(DFSClient.java:1834)
      3. org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:578)
      4. org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:154)
      4 frames
    3. Tachyon Project Core
      HdfsFileInputStream.seek
      1. tachyon.hadoop.HdfsFileInputStream.getHdfsInputStream(HdfsFileInputStream.java:101)
      2. tachyon.hadoop.HdfsFileInputStream.seek(HdfsFileInputStream.java:246)
      2 frames
    4. Hadoop
      FSDataInputStream.seek
      1. org.apache.hadoop.fs.FSDataInputStream.seek(FSDataInputStream.java:37)
      1 frame
    5. Hadoop
      TextInputFormat.getRecordReader
      1. org.apache.hadoop.mapred.LineRecordReader.<init>(LineRecordReader.java:87)
      2. org.apache.hadoop.mapred.TextInputFormat.getRecordReader(TextInputFormat.java:51)
      2 frames
    6. Spark
      Executor$TaskRunner.run
      1. org.apache.spark.rdd.HadoopRDD$$anon$1.<init>(HadoopRDD.scala:236)
      2. org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:212)
      3. org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:101)
      4. org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
      5. org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
      6. org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
      7. org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
      8. org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
      9. org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
      10. org.apache.spark.scheduler.Task.run(Task.scala:64)
      11. org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203)
      11 frames
    7. Java RT
      Thread.run
      1. java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
      2. java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
      3. java.lang.Thread.run(Thread.java:695)
      3 frames