java.lang.RuntimeException

There are no available Samebug tips for this exception. Do you have an idea how to solve this issue? A short tip would help users who saw this issue last week.

  • Here is my code: public class WarcTest { public static void main(String[] args) { String warcPath = "c:\\temp\\pipeData\\warc\\test\\11.warc"; WARCReader warcReader; try { warcReader = WARCReaderFactory.get(new File(warcPath)); Iterator<ArchiveRecord> it = warcReader.iterator(); while (it.hasNext()) { ArchiveRecord record = it.next(); record.dump(); } } catch (IOException e) { e.printStackTrace(); } } } And here is the warc file: WARC/0.18 WARC-Type: warcinfo WARC-Date: 2009-07-15T06:56:30Z WARC-Filename: 1.warc WARC-Record-ID: <urn:uuid:f496e1f2-b96c-45f0-9f43-a1385c7b0939> Content-Type: application/warc-fields Content-Length: 80 Content-Description: Made from C:\temp\1.arc by org.archive.io.Arc2Warc/5800 WARC/0.18 WARC-Type: resource WARC-Target-URI: http://vteoria.com/dif/warc_shaul.html WARC-Date: 20090715065512 IP-Address: 72.167.131.216 WARC-Record-ID: <urn:uuid:63d42fd5-0fae-4a15-81a2-1443e394b449> Content-Type: application/http; msgtype=response Content-Length: 673 HTTP/1.1 200 OK Date: Wed, 15 Jul 2009 06:55:12 GMT Server: Apache Connection: close Content-Type: text/html <HTML> <HEAD> <TITLE>Your Title Here</TITLE> </HEAD> <BODY BGCOLOR="FFFFFF"> <HR> <a href="http://somegreatsite.com">Link Name</a> is a link to another nifty site <H1>This is a Header</H1> <H2>This is a Medium Header</H2> Send me mail at <a href="mailto:support@yourcompany.com"> support@yourcompany.com</a>. <P> This is a new paragraph! <P> <B>This is a new paragraph!</B> <BR> <B><I>This is a new sentence without a paragraph break, in bold italics.</I></B> <HR> </BODY> </HTML> The errors that i get is: Content-Description: Made from C:\temp\1.arc by org.archive.io.Arc2Warc/5800 Jul 27, 2009 1:17:14 PM org.archive.io.ArchiveReader$ArchiveRecordIterator hasNext WARNING: Trying skip of failed record cleanup of {WARC-Type=warcinfo, WARC-Filename=1.warc, reader-identifier=c:\temp\pipeData\warc\test\11.warc, WARC-Date=2009-07-15T06:56:30Z, absolute-offset=0, Content-Length=80, WARC-Record-ID=<urn:uuid:f496e1f2-b96c-45f0-9f43-a1385c7b0939>, Content-Type=application/warc-fields}: Unexpected character a(Expecting d) Jul 27, 2009 1:17:14 PM org.archive.io.ArchiveReader$ArchiveRecordIterator hasNext WARNING: Trying skip of failed record cleanup of {WARC-Type=warcinfo, WARC-Filename=1.warc, reader-identifier=c:\temp\pipeData\warc\test\11.warc, WARC-Date=2009-07-15T06:56:30Z, absolute-offset=0, Content-Length=80, WARC-Record-ID=<urn:uuid:f496e1f2-b96c-45f0-9f43-a1385c7b0939>, Content-Type=application/warc-fields}: Unexpected character 41(Expecting d) Jul 27, 2009 1:17:14 PM org.archive.io.ArchiveReader$ArchiveRecordIterator next WARNING: Bad Record. Trying skip (Current offset 296): Unexpected character 57(Expecting d) Exception in thread "main" java.lang.RuntimeException: After retry (Offset 296) at org.archive.io.ArchiveReader$ArchiveRecordIterator.next(ArchiveReader.java:535) at org.archive.io.ArchiveReader$ArchiveRecordIterator.next(ArchiveReader.java:461) at example.WarcTest.main(WarcTest.java:23) Caused by: java.io.IOException: Unexpected character 52(Expecting d) at org.archive.io.warc.WARCReader.readExpectedChar(WARCReader.java:82) at org.archive.io.warc.WARCReader.gotoEOR(WARCReader.java:72) at org.archive.io.ArchiveReader.cleanupCurrentRecord(ArchiveReader.java:192) at org.archive.io.ArchiveReader.get(ArchiveReader.java:142) at org.archive.io.ArchiveReader$ArchiveRecordIterator.innerNext(ArchiveReader.java:585) at org.archive.io.ArchiveReader$ArchiveRecordIterator.exceptionNext(ArchiveReader.java:560) at org.archive.io.ArchiveReader$ArchiveRecordIterator.next(ArchiveReader.java:528) ... 2 more I think I solved this by modifying the class ArchivedRecord and changing the "read" method to be public int read(byte[] b, int offset, int length) throws IOException { int read = Math.min(length, available()); if (read != -1 && read != 0) { read = this.in.read(b, offset, read); if (read == -1) { String msg = "Premature EOF before end-of-record: " + getHeader().getHeaderFields(); if (isStrict()) { throw new IOException(msg); } setEor(true); System.err.println(Level.WARNING.toString() + " " + msg); } if (this.digest != null && read >= 0) { this.digest.update(b, offset, read); } } /* * Shaul K. set the read to -1 only after the actual increment is done. */ incrementPosition(read); if (read == -1 || read == 0) { read = -1; } return read; } Can you verify? otherwise, what am i doing wrong?
    via by Shaul Kushelevsky,
  • Here is my code: public class WarcTest { public static void main(String[] args) { String warcPath = "c:\\temp\\pipeData\\warc\\test\\11.warc"; WARCReader warcReader; try { warcReader = WARCReaderFactory.get(new File(warcPath)); Iterator<ArchiveRecord> it = warcReader.iterator(); while (it.hasNext()) { ArchiveRecord record = it.next(); record.dump(); } } catch (IOException e) { e.printStackTrace(); } } } And here is the warc file: WARC/0.18 WARC-Type: warcinfo WARC-Date: 2009-07-15T06:56:30Z WARC-Filename: 1.warc WARC-Record-ID: <urn:uuid:f496e1f2-b96c-45f0-9f43-a1385c7b0939> Content-Type: application/warc-fields Content-Length: 80 Content-Description: Made from C:\temp\1.arc by org.archive.io.Arc2Warc/5800 WARC/0.18 WARC-Type: resource WARC-Target-URI: http://vteoria.com/dif/warc_shaul.html WARC-Date: 20090715065512 IP-Address: 72.167.131.216 WARC-Record-ID: <urn:uuid:63d42fd5-0fae-4a15-81a2-1443e394b449> Content-Type: application/http; msgtype=response Content-Length: 673 HTTP/1.1 200 OK Date: Wed, 15 Jul 2009 06:55:12 GMT Server: Apache Connection: close Content-Type: text/html <HTML> <HEAD> <TITLE>Your Title Here</TITLE> </HEAD> <BODY BGCOLOR="FFFFFF"> <HR> <a href="http://somegreatsite.com">Link Name</a> is a link to another nifty site <H1>This is a Header</H1> <H2>This is a Medium Header</H2> Send me mail at <a href="mailto:support@yourcompany.com"> support@yourcompany.com</a>. <P> This is a new paragraph! <P> <B>This is a new paragraph!</B> <BR> <B><I>This is a new sentence without a paragraph break, in bold italics.</I></B> <HR> </BODY> </HTML> The errors that i get is: Content-Description: Made from C:\temp\1.arc by org.archive.io.Arc2Warc/5800 Jul 27, 2009 1:17:14 PM org.archive.io.ArchiveReader$ArchiveRecordIterator hasNext WARNING: Trying skip of failed record cleanup of {WARC-Type=warcinfo, WARC-Filename=1.warc, reader-identifier=c:\temp\pipeData\warc\test\11.warc, WARC-Date=2009-07-15T06:56:30Z, absolute-offset=0, Content-Length=80, WARC-Record-ID=<urn:uuid:f496e1f2-b96c-45f0-9f43-a1385c7b0939>, Content-Type=application/warc-fields}: Unexpected character a(Expecting d) Jul 27, 2009 1:17:14 PM org.archive.io.ArchiveReader$ArchiveRecordIterator hasNext WARNING: Trying skip of failed record cleanup of {WARC-Type=warcinfo, WARC-Filename=1.warc, reader-identifier=c:\temp\pipeData\warc\test\11.warc, WARC-Date=2009-07-15T06:56:30Z, absolute-offset=0, Content-Length=80, WARC-Record-ID=<urn:uuid:f496e1f2-b96c-45f0-9f43-a1385c7b0939>, Content-Type=application/warc-fields}: Unexpected character 41(Expecting d) Jul 27, 2009 1:17:14 PM org.archive.io.ArchiveReader$ArchiveRecordIterator next WARNING: Bad Record. Trying skip (Current offset 296): Unexpected character 57(Expecting d) Exception in thread "main" java.lang.RuntimeException: After retry (Offset 296) at org.archive.io.ArchiveReader$ArchiveRecordIterator.next(ArchiveReader.java:535) at org.archive.io.ArchiveReader$ArchiveRecordIterator.next(ArchiveReader.java:461) at example.WarcTest.main(WarcTest.java:23) Caused by: java.io.IOException: Unexpected character 52(Expecting d) at org.archive.io.warc.WARCReader.readExpectedChar(WARCReader.java:82) at org.archive.io.warc.WARCReader.gotoEOR(WARCReader.java:72) at org.archive.io.ArchiveReader.cleanupCurrentRecord(ArchiveReader.java:192) at org.archive.io.ArchiveReader.get(ArchiveReader.java:142) at org.archive.io.ArchiveReader$ArchiveRecordIterator.innerNext(ArchiveReader.java:585) at org.archive.io.ArchiveReader$ArchiveRecordIterator.exceptionNext(ArchiveReader.java:560) at org.archive.io.ArchiveReader$ArchiveRecordIterator.next(ArchiveReader.java:528) ... 2 more I think I solved this by modifying the class ArchivedRecord and changing the "read" method to be public int read(byte[] b, int offset, int length) throws IOException { int read = Math.min(length, available()); if (read != -1 && read != 0) { read = this.in.read(b, offset, read); if (read == -1) { String msg = "Premature EOF before end-of-record: " + getHeader().getHeaderFields(); if (isStrict()) { throw new IOException(msg); } setEor(true); System.err.println(Level.WARNING.toString() + " " + msg); } if (this.digest != null && read >= 0) { this.digest.update(b, offset, read); } } /* * Shaul K. set the read to -1 only after the actual increment is done. */ incrementPosition(read); if (read == -1 || read == 0) { read = -1; } return read; } Can you verify? otherwise, what am i doing wrong?
    via by Shaul Kushelevsky,
  • GitHub comment 17#71220467
    via GitHub by machawk1
    ,
  • The below was reported by Olaf Freyer up on the mailing list: With the release candidate I seem to be unable to use the v10 WARCReader via the console (not tested if it would fail when using via java, too). Here is how I'm used to use the WARCReader (updated to new package structure) My warcreader shell script basically contains: FOREGROUND='true' CLASS_MAIN='org.archive.io.warc.v10.WARCReader' JMX_OFF='off' $HERITRIX_HOME/bin/heritrix Now I do call: sh warcreader -f dump myWARC.warc Here is what I get: java.lang.ClassCastException: org.archive.io.warc.WARCReaderFactory$UncompressedWARCReader cannot be cast to org.archive.io.warc.v10.WARCReader at org.archive.io.warc.v10.WARCReaderFactory.get(WARCReaderFactory.java:61) at org.archive.io.warc.v10.WARCReader.main(WARCReader.java:298) Also note that I seem to be unable to use the "dump" option of the WARCReader of heritrix-1.10.2, too. Even though it at least starts up I get the following error: Exception processing myWARC.warc: java.io.IOException: Unexpected character a(Expecting d) java.lang.RuntimeException: java.io.IOException: Unexpected character a(Expecting d) at org.archive.io.ArchiveReader$ArchiveRecordIterator.hasNext(ArchiveReader.java:462) at org.archive.io.warc.WARCReader.dump(WARCReader.java:104) at org.archive.io.ArchiveReader.output(ArchiveReader.java:627) at org.archive.io.warc.WARCReader.output(WARCReader.java:156) at org.archive.io.warc.WARCReader.main(WARCReader.java:300) Caused by: java.io.IOException: Unexpected character a(Expecting d) at org.archive.io.warc.WARCReader.readExpectedChar(WARCReader.java:81) at org.archive.io.warc.WARCReader.gotoEOR(WARCReader.java:71) at org.archive.io.ArchiveReader.cleanupCurrentRecord(ArchiveReader.java:190) at org.archive.io.ArchiveReader$ArchiveRecordIterator.hasNext(ArchiveReader.java:460) ... 4 more Thanks in advance for any help/advice Olaf freyer
    via by Michael Stack,
    • java.lang.RuntimeException: After retry (Offset 296) at org.archive.io.ArchiveReader$ArchiveRecordIterator.next(ArchiveReader.java:535) at org.archive.io.ArchiveReader$ArchiveRecordIterator.next(ArchiveReader.java:461) at example.WarcTest.main(WarcTest.java:23) Caused by: java.io.IOException: Unexpected character 52(Expecting d) at org.archive.io.warc.WARCReader.readExpectedChar(WARCReader.java:82) at org.archive.io.warc.WARCReader.gotoEOR(WARCReader.java:72) at org.archive.io.ArchiveReader.cleanupCurrentRecord(ArchiveReader.java:192) at org.archive.io.ArchiveReader.get(ArchiveReader.java:142) at org.archive.io.ArchiveReader$ArchiveRecordIterator.innerNext(ArchiveReader.java:585) at org.archive.io.ArchiveReader$ArchiveRecordIterator.exceptionNext(ArchiveReader.java:560) at org.archive.io.ArchiveReader$ArchiveRecordIterator.next(ArchiveReader.java:528) ... 2 more
    No Bugmate found.