java.lang.IllegalArgumentException: Wrong FS: hdfs://localhost:53548/tmp/dataset/test/simplepojo/c5716ae2-df6a-4ce1-b240-85255d40d728.parquet, expected: file:///

Spring JIRA | Janne Valkealahti | 2 years ago
  1. 0

    In Spring Hadoop it's common to let framework itself to create a Hadoop Configuration instead of rely on a classpath and what then would get set when Configuration class is instantiated. With Kite SDK and its usage of ParquetReader/Writer it is not possible to pass your own custom Configuration. Below stacktrace is throw from Reader when tests are run with Hadoop's minicluster where Configuration is provided by Hadoop itself. This is a similar use case when Spring Hadoop creates its own custom Configuration. {code} java.lang.IllegalArgumentException: Wrong FS: hdfs://localhost:53548/tmp/dataset/test/simplepojo/c5716ae2-df6a-4ce1-b240-85255d40d728.parquet, expected: file:/// at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:642) at org.apache.hadoop.fs.RawLocalFileSystem.pathToFile(RawLocalFileSystem.java:69) at org.apache.hadoop.fs.RawLocalFileSystem.listStatus(RawLocalFileSystem.java:375) at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1482) at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1522) at org.apache.hadoop.fs.ChecksumFileSystem.listStatus(ChecksumFileSystem.java:564) at parquet.hadoop.ParquetReader.<init>(ParquetReader.java:95) at parquet.hadoop.ParquetReader.<init>(ParquetReader.java:79) at parquet.hadoop.ParquetReader.<init>(ParquetReader.java:59) at parquet.avro.AvroParquetReader.<init>(AvroParquetReader.java:36) at org.kitesdk.data.spi.filesystem.ParquetFileSystemDatasetReader.open(ParquetFileSystemDatasetReader.java:67) at org.kitesdk.data.spi.filesystem.MultiFileDatasetReader.openNextReader(MultiFileDatasetReader.java:92) at org.kitesdk.data.spi.filesystem.MultiFileDatasetReader.hasNext(MultiFileDatasetReader.java:106) at org.springframework.data.hadoop.store.dataset.DatasetTemplate.readGenericRecords(DatasetTemplate.java:232) {code} Let's check the call stack: {code} Thread [main] (Suspended (breakpoint at line 95 in ParquetReader)) AvroParquetReader<T>(ParquetReader<T>).<init>(Configuration, Path, ReadSupport<T>, UnboundRecordFilter) line: 95 AvroParquetReader<T>(ParquetReader<T>).<init>(Path, ReadSupport<T>, UnboundRecordFilter) line: 79 AvroParquetReader<T>(ParquetReader<T>).<init>(Path, ReadSupport<T>) line: 59 AvroParquetReader<T>.<init>(Path) line: 36 ParquetFileSystemDatasetReader<E>.open() line: 67 MultiFileDatasetReader<E>.openNextReader() line: 92 MultiFileDatasetReader<E>.hasNext() line: 106 DatasetTemplate.readGenericRecords(Class<T>, PartitionKey) line: 232 DatasetTemplate.read(Class<T>) line: 137 DatasetTemplateParquetTests.testSavePojo() line: 101 {code} Culprit seem to be: {code:java} public ParquetReader(Path file, ReadSupport<T> readSupport, UnboundRecordFilter filter) throws IOException { this(new Configuration(), file, readSupport, filter); } {code} where call from org.kitesdk.data.spi.filesystem.ParquetFileSystemDatasetReader.open() ends up. There is a constructor along a way to pass Hadoop Configuration but Kite doesn't allow to use it thus defaulting to what happens when Configuration is instantiated. Path for file itself will have a correct hdfs uri. There is a similar problem with Writer but it seems that correct uri in Path is enough, but with Reader a status check will fail because default Configuration(without core-site.xml in a classpath) will point to file:// and uri in Path will point to hdfs://.

    Spring JIRA | 2 years ago | Janne Valkealahti
    java.lang.IllegalArgumentException: Wrong FS: hdfs://localhost:53548/tmp/dataset/test/simplepojo/c5716ae2-df6a-4ce1-b240-85255d40d728.parquet, expected: file:///
  2. 0

    In Spring Hadoop it's common to let framework itself to create a Hadoop Configuration instead of rely on a classpath and what then would get set when Configuration class is instantiated. With Kite SDK and its usage of ParquetReader/Writer it is not possible to pass your own custom Configuration. Below stacktrace is throw from Reader when tests are run with Hadoop's minicluster where Configuration is provided by Hadoop itself. This is a similar use case when Spring Hadoop creates its own custom Configuration. {code} java.lang.IllegalArgumentException: Wrong FS: hdfs://localhost:53548/tmp/dataset/test/simplepojo/c5716ae2-df6a-4ce1-b240-85255d40d728.parquet, expected: file:/// at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:642) at org.apache.hadoop.fs.RawLocalFileSystem.pathToFile(RawLocalFileSystem.java:69) at org.apache.hadoop.fs.RawLocalFileSystem.listStatus(RawLocalFileSystem.java:375) at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1482) at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1522) at org.apache.hadoop.fs.ChecksumFileSystem.listStatus(ChecksumFileSystem.java:564) at parquet.hadoop.ParquetReader.<init>(ParquetReader.java:95) at parquet.hadoop.ParquetReader.<init>(ParquetReader.java:79) at parquet.hadoop.ParquetReader.<init>(ParquetReader.java:59) at parquet.avro.AvroParquetReader.<init>(AvroParquetReader.java:36) at org.kitesdk.data.spi.filesystem.ParquetFileSystemDatasetReader.open(ParquetFileSystemDatasetReader.java:67) at org.kitesdk.data.spi.filesystem.MultiFileDatasetReader.openNextReader(MultiFileDatasetReader.java:92) at org.kitesdk.data.spi.filesystem.MultiFileDatasetReader.hasNext(MultiFileDatasetReader.java:106) at org.springframework.data.hadoop.store.dataset.DatasetTemplate.readGenericRecords(DatasetTemplate.java:232) {code} Let's check the call stack: {code} Thread [main] (Suspended (breakpoint at line 95 in ParquetReader)) AvroParquetReader<T>(ParquetReader<T>).<init>(Configuration, Path, ReadSupport<T>, UnboundRecordFilter) line: 95 AvroParquetReader<T>(ParquetReader<T>).<init>(Path, ReadSupport<T>, UnboundRecordFilter) line: 79 AvroParquetReader<T>(ParquetReader<T>).<init>(Path, ReadSupport<T>) line: 59 AvroParquetReader<T>.<init>(Path) line: 36 ParquetFileSystemDatasetReader<E>.open() line: 67 MultiFileDatasetReader<E>.openNextReader() line: 92 MultiFileDatasetReader<E>.hasNext() line: 106 DatasetTemplate.readGenericRecords(Class<T>, PartitionKey) line: 232 DatasetTemplate.read(Class<T>) line: 137 DatasetTemplateParquetTests.testSavePojo() line: 101 {code} Culprit seem to be: {code:java} public ParquetReader(Path file, ReadSupport<T> readSupport, UnboundRecordFilter filter) throws IOException { this(new Configuration(), file, readSupport, filter); } {code} where call from org.kitesdk.data.spi.filesystem.ParquetFileSystemDatasetReader.open() ends up. There is a constructor along a way to pass Hadoop Configuration but Kite doesn't allow to use it thus defaulting to what happens when Configuration is instantiated. Path for file itself will have a correct hdfs uri. There is a similar problem with Writer but it seems that correct uri in Path is enough, but with Reader a status check will fail because default Configuration(without core-site.xml in a classpath) will point to file:// and uri in Path will point to hdfs://.

    Spring JIRA | 2 years ago | Janne Valkealahti
    java.lang.IllegalArgumentException: Wrong FS: hdfs://localhost:53548/tmp/dataset/test/simplepojo/c5716ae2-df6a-4ce1-b240-85255d40d728.parquet, expected: file:///
  3. 0

    AvroParquetReader only reads from local file system

    GitHub | 3 years ago | doug-explorys
    java.lang.IllegalArgumentException: Wrong FS: hdfs://server:8021/dir/file.prq, expected: file:///
  4. Speed up your debug routine!

    Automated exception search integrated into your IDE

  5. 0

    Hadoop Pipes Error

    Google Groups | 6 years ago | Adarsh Sharma
    java.lang.IllegalArgumentException: Wrong FS: hdfs://ws-test:54310/user/hadoop/gutenberg, expected: file:
  6. 0

    A problem when I run with hdfs - Dato Forum

    dato.com | 1 year ago
    java.lang.IllegalArgumentException: Wrong FS: hdfs://hydra:9000/home/lyuwei/hadoop-tmp/data, expected: file:///

    Not finding the right solution?
    Take a tour to get the most out of Samebug.

    Tired of useless tips?

    Automated exception search integrated into your IDE

    Root Cause Analysis

    1. java.lang.IllegalArgumentException

      Wrong FS: hdfs://localhost:53548/tmp/dataset/test/simplepojo/c5716ae2-df6a-4ce1-b240-85255d40d728.parquet, expected: file:///

      at org.apache.hadoop.fs.FileSystem.checkPath()
    2. Hadoop
      ChecksumFileSystem.listStatus
      1. org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:642)
      2. org.apache.hadoop.fs.RawLocalFileSystem.pathToFile(RawLocalFileSystem.java:69)
      3. org.apache.hadoop.fs.RawLocalFileSystem.listStatus(RawLocalFileSystem.java:375)
      4. org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1482)
      5. org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1522)
      6. org.apache.hadoop.fs.ChecksumFileSystem.listStatus(ChecksumFileSystem.java:564)
      6 frames
    3. Parquet
      ParquetReader.<init>
      1. parquet.hadoop.ParquetReader.<init>(ParquetReader.java:95)
      2. parquet.hadoop.ParquetReader.<init>(ParquetReader.java:79)
      3. parquet.hadoop.ParquetReader.<init>(ParquetReader.java:59)
      3 frames
    4. parquet.avro
      AvroParquetReader.<init>
      1. parquet.avro.AvroParquetReader.<init>(AvroParquetReader.java:36)
      1 frame
    5. Kite Data Core Module
      MultiFileDatasetReader.hasNext
      1. org.kitesdk.data.spi.filesystem.ParquetFileSystemDatasetReader.open(ParquetFileSystemDatasetReader.java:67)
      2. org.kitesdk.data.spi.filesystem.MultiFileDatasetReader.openNextReader(MultiFileDatasetReader.java:92)
      3. org.kitesdk.data.spi.filesystem.MultiFileDatasetReader.hasNext(MultiFileDatasetReader.java:106)
      3 frames
    6. Spring for Apache Hadoop Store Features
      DatasetTemplate.readGenericRecords
      1. org.springframework.data.hadoop.store.dataset.DatasetTemplate.readGenericRecords(DatasetTemplate.java:232)
      1 frame