java.lang.RuntimeException: Expected exactly one file, got [s3://h2o-datasets/covtype.data, s3://h2o-datasets/covtype.data.gz]

JIRA | Michal Malohlava | 3 months ago
  1. 0

    H2O 3.10.0.6 val f = new H2OFrame(java.net.URI.create("s3://h2o-datasets/covtype.data")) causes this: {noformat} java.lang.RuntimeException: Expected exactly one file, got [s3://h2o-datasets/covtype.data, s3://h2o-datasets/covtype.data.gz] at water.persist.PersistS3.uriToKey(PersistS3.java:326) at water.persist.PersistManager.anyURIToKey(PersistManager.java:172) at water.util.FrameUtils.parseFrame(FrameUtils.java:56) at water.util.FrameUtils.parseFrame(FrameUtils.java:47) at water.fvec.H2OFrame.<init>(H2OFrame.scala:66) at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:28) at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:33) at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:35) at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:37) at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:39) at $iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:41) at $iwC$$iwC$$iwC$$iwC.<init>(<console>:43) at $iwC$$iwC$$iwC.<init>(<console>:45) at $iwC$$iwC.<init>(<console>:47) at $iwC.<init>(<console>:49) at <init>(<console>:51) at .<init>(<console>:55) at .<clinit>(<console>) at .<init>(<console>:7) at .<clinit>(<console>) at $print(<console>) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065) at org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1346) at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:840) at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:871) at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:819) at org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:857) at org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:902) at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:814) at org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:657) at org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:665) at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$loop(SparkILoop.scala:670) at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:997) at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945) at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945) at scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135) at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$process(SparkILoop.scala:945) at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1059) at org.apache.spark.repl.Main$.main(Main.scala:31) at org.apache.spark.repl.Main.main(Main.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731) at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181) at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) {noformat}

    JIRA | 3 months ago | Michal Malohlava
    java.lang.RuntimeException: Expected exactly one file, got [s3://h2o-datasets/covtype.data, s3://h2o-datasets/covtype.data.gz]
  2. 0

    I was doing this on ec2. probably same if done elsewhere i added some tests on this (we've not ported all va stuff to fvec yet) but there's enough info here so you can test it in a browser fvec import/parse of the same file with s3n worked fine 01:28:40.694 # Session INFO HTTPD: GET /2/ImportFiles2.html path=s3://home-0xdiag-datasets/standard 01:28:53.539 # Session INFO HTTPD: GET /2/Parse2.query source_key=s3://home-0xdiag-datasets/standard/new-poker-hand.full.311M.txt.gz 01:29:12.327 # Session INFO HTTPD: GET /2/Parse2.html single_quotes=0 delete_on_done=1 header=0 separator=44 parser_type=CSV destination_key=new_poker_hand_full_311M_txt.hex source_key=s3://home-0xdiag-datasets/standard/new-poker-hand.full.311M.txt.gz water.DException$DistributedException: from /10.215.86.4:54321; java.lang.RuntimeException: java.util.zip.ZipException: invalid block type while mapping key s3://home-0xdiag-datasets/standard/new-poker-hand.full.311M.txt.gz 01:29:17.178 FJ-0-3 INFO WATER: at water.parser.CustomParser$StreamData.getChunkData(CustomParser.java:270) 01:29:17.178 FJ-0-3 INFO WATER: at water.parser.CsvParser.parallelParse(CsvParser.java:417) 01:29:17.179 FJ-0-3 INFO WATER: at water.parser.CustomParser.streamParse(CustomParser.java:196) 01:29:17.179 FJ-0-3 INFO WATER: at water.fvec.ParseDataset2$MultiFileParseTask.streamParse(ParseDataset2.java:556) 01:29:17.180 FJ-0-3 INFO WATER: at water.fvec.ParseDataset2$MultiFileParseTask.map(ParseDataset2.java:509) 01:29:17.181 FJ-0-3 INFO WATER: at water.MRTask.lcompute(MRTask.java:75) 01:29:17.181 FJ-0-3 INFO WATER: at water.DRemoteTask.compute2(DRemoteTask.java:91) 01:29:17.181 FJ-0-3 INFO WATER: at water.H2O$H2OCountedCompleter.compute(H2O.java:710) 01:29:17.182 FJ-0-3 INFO WATER: at jsr166y.CountedCompleter.exec(CountedCompleter.java:429) 01:29:17.183 FJ-0-3 INFO WATER: at jsr166y.ForkJoinTask.doExec(ForkJoinTask.java:263) 01:29:17.183 FJ-0-3 INFO WATER: at jsr166y.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:974) 01:29:17.184 FJ-0-3 INFO WATER: at jsr166y.ForkJoinPool.runWorker(ForkJoinPool.java:1477) 01:29:17.184 FJ-0-3 INFO WATER: at jsr166y.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:104) some other email I sent: this parse worked http://ec2-54-196-202-226.compute-1.amazonaws.com:54321/2/Parse2.query?source_key=s3n://home-0xdiag-datasets/standard/1mx10_hastie_10_2.data.gz On 06/08/2014 01:35 PM, Kevin wrote: > tried another known good gz... > same thing > > water.DException$DistributedException: from /10.215.86.4:54321; java.lang.RuntimeException: java.util.zip.ZipException: invalid block type while mapping key s3://home-0xdiag-datasets/standard/1mx10_hastie_10_2.data.gz > > > Going to try s3n now > I guess I should find a .zip also, since that might be different > > > On 06/08/2014 01:33 PM, Kevin wrote: >> water.DException$DistributedException: from /10.215.86.4:54321; java.lang.RuntimeException: java.util.zip.ZipException: invalid block type while mapping key s3://home-0xdiag-datasets/standard/new-poker-hand.full.311M.txt.gz >> >> key is locked after this error >> i.e. if I retry without unlocking >> from /10.209.21.41:54321; java.lang.IllegalArgumentException: Dataset new_poker_hand_full_311M_txt.hex is already in use. Unable to use it now. Consider using a different destination name. >> >> >> retry after unlocking gets same result >> water.DException$DistributedException: from /10.215.86.4:54321; java.lang.RuntimeException: java.util.zip.ZipException: invalid block type while mapping key s3://home-0xdiag-datasets/standard/new-poker-hand.full.311M.txt.gz >> >> >> This is on aws 4 node ec2 cloud >> >> built with >> cd h2o/py >> python ec2_cmd.py create >> python ec2_cmd.py start_h2o --hosts ec2-config-r-4efa5a31.json >> >> >> my aws credentials are in /home/kevin/.ec2 on that ec2 system >> >

    JIRA | 3 years ago | Kevin Normoyle
    java.lang.RuntimeException: java.util.zip.ZipException: invalid block type while mapping key s3://home-0xdiag-datasets/standard/new-poker-hand.full.311M.txt.gz
  3. Speed up your debug routine!

    Automated exception search integrated into your IDE

  4. 0

    I was just running this on one ec2 machine created via cd h2o/py python ec2_cmd.py --instances 1 create That creates a json that a use here cd testdir_hosts python test_parse_summary_manyfiles_s3_fvec.py -cj ../ec2-config-r-8c5c4bfd.json the json file is the config file created by the ec2 This test is just a port of the s3n version that has been in the jenkins ec2 job (testdir_hosts) for a while The s3n version times out if run on just one machine, but the s3 version quickly gets an error, even though it's doing the same thing with same files, just s3 instead of s3n although I had tests in the past, I noticed I didn't have the nflx gz manyfile variant in s3 in jenkins (I had s3n) I had de-emphasized s3 over s3n I changed a s3n test to s3, and I got this zip exception, which was surprising. the commands were straightforward import and pattern match parse I'll have to see if the data file is broken. but the same data file is used for the s3n tests i.e. 2014-08-01 14:47:01.123215 -- Start http://10.180.220.236:54321/2/ImportFiles2.json?path=s3://home-0xdiag-datasets/manyfiles-nflx-gz 2014-08-01 14:47:02.822942 -- Start http://10.180.220.236:54321/StoreView.json?offset=0&view=10000 2014-08-01 14:47:03.438965 -- Start http://10.180.220.236:54321/2/ImportFiles2.json?path=s3://home-0xdiag-datasets/manyfiles-nflx-gz 2014-08-01 14:47:04.196352 -- Start http://10.180.220.236:54321/2/Parse2.json?destination_key=manyfiles-nflx-gz_0.hex&source_key=s3://home-0xdiag-datasets/manyfiles-nflx-gz/file_[2][0-9][0-9].dat.gz java.lang.RuntimeException: java.util.zip.ZipException: invalid distance code while mapping key s3://home-0xdiag-datasets/manyfiles-nflx-gz/file_200.dat.gz 02:46:39.621 FJ-0-1 INFO WATER: at water.parser.CustomParser$StreamData.getChunkData(CustomParser.java:270) 02:46:39.622 FJ-0-1 INFO WATER: at water.parser.CsvParser.parallelParse(CsvParser.java:403) 02:46:39.622 FJ-0-1 INFO WATER: at water.parser.CustomParser.streamParse(CustomParser.java:196) 02:46:39.623 FJ-0-1 INFO WATER: at water.fvec.ParseDataset2$MultiFileParseTask.streamParse(ParseDataset2.java:617) 02:46:39.623 FJ-0-1 INFO WATER: at water.fvec.ParseDataset2$MultiFileParseTask.map(ParseDataset2.java:569) 02:46:39.624 FJ-0-1 INFO WATER: at water.MRTask.lcompute(MRTask.java:68) 02:46:39.624 FJ-0-1 INFO WATER: at water.DRemoteTask.compute2(DRemoteTask.java:91) 02:46:39.625 FJ-0-1 INFO WATER: at water.MRTask.lcompute(MRTask.java:62) 02:46:39.625 FJ-0-1 INFO WATER: at water.DRemoteTask.compute2(DRemoteTask.java:91) 02:46:39.626 FJ-0-1 INFO WATER: at water.MRTask.lcompute(MRTask.java:62) 02:46:39.627 FJ-0-1 INFO WATER: at water.DRemoteTask.compute2(DRemoteTask.java:91) 02:46:39.627 FJ-0-1 INFO WATER: at water.MRTask.lcompute(MRTask.java:62) 02:46:39.628 FJ-0-1 INFO WATER: at water.DRemoteTask.compute2(DRemoteTask.java:91) 02:46:39.628 FJ-0-1 INFO WATER: at water.MRTask.lcompute(MRTask.java:62) 02:46:39.629 FJ-0-1 INFO WATER: at water.DRemoteTask.compute2(DRemoteTask.java:91) 02:46:39.630 FJ-0-1 INFO WATER: at water.MRTask.lcompute(MRTask.java:62) 02:46:39.630 FJ-0-1 INFO WATER: at water.DRemoteTask.compute2(DRemoteTask.java:91) 02:46:39.631 FJ-0-1 INFO WATER: at water.MRTask.lcompute(MRTask.java:62) 02:46:39.631 FJ-0-1 INFO WATER: at water.DRemoteTask.compute2(DRemoteTask.java:91) 02:46:39.632 FJ-0-1 INFO WATER: at water.H2O$H2OCountedCompleter.compute(H2O.java:714) 02:46:39.633 FJ-0-1 INFO WATER: at jsr166y.CountedCompleter.exec(CountedCompleter.java:429) 02:46:39.633 FJ-0-1 INFO WATER: at jsr166y.ForkJoinTask.doExec(ForkJoinTask.java:263) 02:46:39.634 FJ-0-1 INFO WATER: at jsr166y.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:974) 02:46:39.635 FJ-0-1 INFO WATER: at jsr166y.ForkJoinPool.runWorker(ForkJoinPool.java:1477) 02:46:39.635 FJ-0-1 INFO WATER: at jsr166y.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:104) 02:46:39.636 FJ-0-1 INFO WATER: Caused by: java.util.zip.ZipException: invalid distance code 02:46:39.637 FJ-0-1 INFO WATER: at java.util.zip.InflaterInputStream.read(InflaterInputStream.java:147) 02:46:39.638 FJ-0-1 INFO WATER: at java.util.zip.GZIPInputStream.read(GZIPInputStream.java:90) 02:46:39.638 FJ-0-1 INFO WATER: at water.parser.CustomParser$StreamData.getChunkData(CustomParser.java:264) 02:46:39.639 FJ-0-1 INFO WATER: ... 24 more

    JIRA | 2 years ago | Kevin Normoyle
    java.lang.RuntimeException: java.util.zip.ZipException: invalid distance code while mapping key s3://home-0xdiag-datasets/manyfiles-nflx-gz/file_200.dat.gz 02:46:39.621 FJ-0-1 INFO WATER: at water.parser.CustomParser$StreamData.getChunkData(CustomParser.java:270) 02:46:39.622 FJ-0-1 INFO WATER: at water.parser.CsvParser.parallelParse(CsvParser.java:403) 02:46:39.622 FJ-0-1 INFO WATER: at water.parser.CustomParser.streamParse(CustomParser.java:196) 02:46:39.623 FJ-0-1 INFO WATER: at water.fvec.ParseDataset2$MultiFileParseTask.streamParse(ParseDataset2.java:617) 02:46:39.623 FJ-0-1 INFO WATER: at water.fvec.ParseDataset2$MultiFileParseTask.map(ParseDataset2.java:569) 02:46:39.624 FJ-0-1 INFO WATER: at water.MRTask.lcompute(MRTask.java:68) 02:46:39.624 FJ-0-1 INFO WATER: at water.DRemoteTask.compute2(DRemoteTask.java:91) 02:46:39.625 FJ-0-1 INFO WATER: at water.MRTask.lcompute(MRTask.java:62) 02:46:39.625 FJ-0-1 INFO WATER: at water.DRemoteTask.compute2(DRemoteTask.java:91) 02:46:39.626 FJ-0-1 INFO WATER: at water.MRTask.lcompute(MRTask.java:62)

    Not finding the right solution?
    Take a tour to get the most out of Samebug.

    Tired of useless tips?

    Automated exception search integrated into your IDE

    Root Cause Analysis

    1. java.lang.RuntimeException

      Expected exactly one file, got [s3://h2o-datasets/covtype.data, s3://h2o-datasets/covtype.data.gz]

      at water.persist.PersistS3.uriToKey()
    2. water.persist
      PersistManager.anyURIToKey
      1. water.persist.PersistS3.uriToKey(PersistS3.java:326)
      2. water.persist.PersistManager.anyURIToKey(PersistManager.java:172)
      2 frames
    3. water.util
      FrameUtils.parseFrame
      1. water.util.FrameUtils.parseFrame(FrameUtils.java:56)
      2. water.util.FrameUtils.parseFrame(FrameUtils.java:47)
      2 frames
    4. water.fvec
      H2OFrame.<init>
      1. water.fvec.H2OFrame.<init>(H2OFrame.scala:66)
      1 frame
    5. Unknown
      $iwC.<init>
      1. $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:28)
      2. $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:33)
      3. $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:35)
      4. $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:37)
      5. $iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:39)
      6. $iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:41)
      7. $iwC$$iwC$$iwC$$iwC.<init>(<console>:43)
      8. $iwC$$iwC$$iwC.<init>(<console>:45)
      9. $iwC$$iwC.<init>(<console>:47)
      10. $iwC.<init>(<console>:49)
      10 frames