java.lang.RuntimeException: After retry (Offset 296)

JIRA | Shaul Kushelevsky | 8 years ago
tip
Click on the to mark the solution that helps you, Samebug will learn from it.
As a community member, you’ll be rewarded for you help.
  1. 0

    Here is my code: public class WarcTest { public static void main(String[] args) { String warcPath = "c:\\temp\\pipeData\\warc\\test\\11.warc"; WARCReader warcReader; try { warcReader = WARCReaderFactory.get(new File(warcPath)); Iterator<ArchiveRecord> it = warcReader.iterator(); while (it.hasNext()) { ArchiveRecord record = it.next(); record.dump(); } } catch (IOException e) { e.printStackTrace(); } } } And here is the warc file: WARC/0.18 WARC-Type: warcinfo WARC-Date: 2009-07-15T06:56:30Z WARC-Filename: 1.warc WARC-Record-ID: <urn:uuid:f496e1f2-b96c-45f0-9f43-a1385c7b0939> Content-Type: application/warc-fields Content-Length: 80 Content-Description: Made from C:\temp\1.arc by org.archive.io.Arc2Warc/5800 WARC/0.18 WARC-Type: resource WARC-Target-URI: http://vteoria.com/dif/warc_shaul.html WARC-Date: 20090715065512 IP-Address: 72.167.131.216 WARC-Record-ID: <urn:uuid:63d42fd5-0fae-4a15-81a2-1443e394b449> Content-Type: application/http; msgtype=response Content-Length: 673 HTTP/1.1 200 OK Date: Wed, 15 Jul 2009 06:55:12 GMT Server: Apache Connection: close Content-Type: text/html <HTML> <HEAD> <TITLE>Your Title Here</TITLE> </HEAD> <BODY BGCOLOR="FFFFFF"> <HR> <a href="http://somegreatsite.com">Link Name</a> is a link to another nifty site <H1>This is a Header</H1> <H2>This is a Medium Header</H2> Send me mail at <a href="mailto:support@yourcompany.com"> support@yourcompany.com</a>. <P> This is a new paragraph! <P> <B>This is a new paragraph!</B> <BR> <B><I>This is a new sentence without a paragraph break, in bold italics.</I></B> <HR> </BODY> </HTML> The errors that i get is: Content-Description: Made from C:\temp\1.arc by org.archive.io.Arc2Warc/5800 Jul 27, 2009 1:17:14 PM org.archive.io.ArchiveReader$ArchiveRecordIterator hasNext WARNING: Trying skip of failed record cleanup of {WARC-Type=warcinfo, WARC-Filename=1.warc, reader-identifier=c:\temp\pipeData\warc\test\11.warc, WARC-Date=2009-07-15T06:56:30Z, absolute-offset=0, Content-Length=80, WARC-Record-ID=<urn:uuid:f496e1f2-b96c-45f0-9f43-a1385c7b0939>, Content-Type=application/warc-fields}: Unexpected character a(Expecting d) Jul 27, 2009 1:17:14 PM org.archive.io.ArchiveReader$ArchiveRecordIterator hasNext WARNING: Trying skip of failed record cleanup of {WARC-Type=warcinfo, WARC-Filename=1.warc, reader-identifier=c:\temp\pipeData\warc\test\11.warc, WARC-Date=2009-07-15T06:56:30Z, absolute-offset=0, Content-Length=80, WARC-Record-ID=<urn:uuid:f496e1f2-b96c-45f0-9f43-a1385c7b0939>, Content-Type=application/warc-fields}: Unexpected character 41(Expecting d) Jul 27, 2009 1:17:14 PM org.archive.io.ArchiveReader$ArchiveRecordIterator next WARNING: Bad Record. Trying skip (Current offset 296): Unexpected character 57(Expecting d) Exception in thread "main" java.lang.RuntimeException: After retry (Offset 296) at org.archive.io.ArchiveReader$ArchiveRecordIterator.next(ArchiveReader.java:535) at org.archive.io.ArchiveReader$ArchiveRecordIterator.next(ArchiveReader.java:461) at example.WarcTest.main(WarcTest.java:23) Caused by: java.io.IOException: Unexpected character 52(Expecting d) at org.archive.io.warc.WARCReader.readExpectedChar(WARCReader.java:82) at org.archive.io.warc.WARCReader.gotoEOR(WARCReader.java:72) at org.archive.io.ArchiveReader.cleanupCurrentRecord(ArchiveReader.java:192) at org.archive.io.ArchiveReader.get(ArchiveReader.java:142) at org.archive.io.ArchiveReader$ArchiveRecordIterator.innerNext(ArchiveReader.java:585) at org.archive.io.ArchiveReader$ArchiveRecordIterator.exceptionNext(ArchiveReader.java:560) at org.archive.io.ArchiveReader$ArchiveRecordIterator.next(ArchiveReader.java:528) ... 2 more I think I solved this by modifying the class ArchivedRecord and changing the "read" method to be public int read(byte[] b, int offset, int length) throws IOException { int read = Math.min(length, available()); if (read != -1 && read != 0) { read = this.in.read(b, offset, read); if (read == -1) { String msg = "Premature EOF before end-of-record: " + getHeader().getHeaderFields(); if (isStrict()) { throw new IOException(msg); } setEor(true); System.err.println(Level.WARNING.toString() + " " + msg); } if (this.digest != null && read >= 0) { this.digest.update(b, offset, read); } } /* * Shaul K. set the read to -1 only after the actual increment is done. */ incrementPosition(read); if (read == -1 || read == 0) { read = -1; } return read; } Can you verify? otherwise, what am i doing wrong?

    JIRA | 8 years ago | Shaul Kushelevsky
    java.lang.RuntimeException: After retry (Offset 296)
  2. 0

    Here is my code: public class WarcTest { public static void main(String[] args) { String warcPath = "c:\\temp\\pipeData\\warc\\test\\11.warc"; WARCReader warcReader; try { warcReader = WARCReaderFactory.get(new File(warcPath)); Iterator<ArchiveRecord> it = warcReader.iterator(); while (it.hasNext()) { ArchiveRecord record = it.next(); record.dump(); } } catch (IOException e) { e.printStackTrace(); } } } And here is the warc file: WARC/0.18 WARC-Type: warcinfo WARC-Date: 2009-07-15T06:56:30Z WARC-Filename: 1.warc WARC-Record-ID: <urn:uuid:f496e1f2-b96c-45f0-9f43-a1385c7b0939> Content-Type: application/warc-fields Content-Length: 80 Content-Description: Made from C:\temp\1.arc by org.archive.io.Arc2Warc/5800 WARC/0.18 WARC-Type: resource WARC-Target-URI: http://vteoria.com/dif/warc_shaul.html WARC-Date: 20090715065512 IP-Address: 72.167.131.216 WARC-Record-ID: <urn:uuid:63d42fd5-0fae-4a15-81a2-1443e394b449> Content-Type: application/http; msgtype=response Content-Length: 673 HTTP/1.1 200 OK Date: Wed, 15 Jul 2009 06:55:12 GMT Server: Apache Connection: close Content-Type: text/html <HTML> <HEAD> <TITLE>Your Title Here</TITLE> </HEAD> <BODY BGCOLOR="FFFFFF"> <HR> <a href="http://somegreatsite.com">Link Name</a> is a link to another nifty site <H1>This is a Header</H1> <H2>This is a Medium Header</H2> Send me mail at <a href="mailto:support@yourcompany.com"> support@yourcompany.com</a>. <P> This is a new paragraph! <P> <B>This is a new paragraph!</B> <BR> <B><I>This is a new sentence without a paragraph break, in bold italics.</I></B> <HR> </BODY> </HTML> The errors that i get is: Content-Description: Made from C:\temp\1.arc by org.archive.io.Arc2Warc/5800 Jul 27, 2009 1:17:14 PM org.archive.io.ArchiveReader$ArchiveRecordIterator hasNext WARNING: Trying skip of failed record cleanup of {WARC-Type=warcinfo, WARC-Filename=1.warc, reader-identifier=c:\temp\pipeData\warc\test\11.warc, WARC-Date=2009-07-15T06:56:30Z, absolute-offset=0, Content-Length=80, WARC-Record-ID=<urn:uuid:f496e1f2-b96c-45f0-9f43-a1385c7b0939>, Content-Type=application/warc-fields}: Unexpected character a(Expecting d) Jul 27, 2009 1:17:14 PM org.archive.io.ArchiveReader$ArchiveRecordIterator hasNext WARNING: Trying skip of failed record cleanup of {WARC-Type=warcinfo, WARC-Filename=1.warc, reader-identifier=c:\temp\pipeData\warc\test\11.warc, WARC-Date=2009-07-15T06:56:30Z, absolute-offset=0, Content-Length=80, WARC-Record-ID=<urn:uuid:f496e1f2-b96c-45f0-9f43-a1385c7b0939>, Content-Type=application/warc-fields}: Unexpected character 41(Expecting d) Jul 27, 2009 1:17:14 PM org.archive.io.ArchiveReader$ArchiveRecordIterator next WARNING: Bad Record. Trying skip (Current offset 296): Unexpected character 57(Expecting d) Exception in thread "main" java.lang.RuntimeException: After retry (Offset 296) at org.archive.io.ArchiveReader$ArchiveRecordIterator.next(ArchiveReader.java:535) at org.archive.io.ArchiveReader$ArchiveRecordIterator.next(ArchiveReader.java:461) at example.WarcTest.main(WarcTest.java:23) Caused by: java.io.IOException: Unexpected character 52(Expecting d) at org.archive.io.warc.WARCReader.readExpectedChar(WARCReader.java:82) at org.archive.io.warc.WARCReader.gotoEOR(WARCReader.java:72) at org.archive.io.ArchiveReader.cleanupCurrentRecord(ArchiveReader.java:192) at org.archive.io.ArchiveReader.get(ArchiveReader.java:142) at org.archive.io.ArchiveReader$ArchiveRecordIterator.innerNext(ArchiveReader.java:585) at org.archive.io.ArchiveReader$ArchiveRecordIterator.exceptionNext(ArchiveReader.java:560) at org.archive.io.ArchiveReader$ArchiveRecordIterator.next(ArchiveReader.java:528) ... 2 more I think I solved this by modifying the class ArchivedRecord and changing the "read" method to be public int read(byte[] b, int offset, int length) throws IOException { int read = Math.min(length, available()); if (read != -1 && read != 0) { read = this.in.read(b, offset, read); if (read == -1) { String msg = "Premature EOF before end-of-record: " + getHeader().getHeaderFields(); if (isStrict()) { throw new IOException(msg); } setEor(true); System.err.println(Level.WARNING.toString() + " " + msg); } if (this.digest != null && read >= 0) { this.digest.update(b, offset, read); } } /* * Shaul K. set the read to -1 only after the actual increment is done. */ incrementPosition(read); if (read == -1 || read == 0) { read = -1; } return read; } Can you verify? otherwise, what am i doing wrong?

    JIRA | 8 years ago | Shaul Kushelevsky
    java.lang.RuntimeException: After retry (Offset 296)
  3. 0

    Tika process exits and skips rest of arc-file parsing.

    GitHub | 2 years ago | thomasegense
    java.lang.RuntimeException: After retry (Offset 29424030)
  4. Speed up your debug routine!

    Automated exception search integrated into your IDE

  5. 0

    GitHub comment 17#71220467

    GitHub | 2 years ago | machawk1
    java.lang.RuntimeException: After retry (Offset 94220)

    Root Cause Analysis

    1. java.io.IOException

      Unexpected character 52(Expecting d)

      at org.archive.io.warc.WARCReader.readExpectedChar()
    2. webarchive-commons
      ArchiveReader$ArchiveRecordIterator.next
      1. org.archive.io.warc.WARCReader.readExpectedChar(WARCReader.java:82)
      2. org.archive.io.warc.WARCReader.gotoEOR(WARCReader.java:72)
      3. org.archive.io.ArchiveReader.cleanupCurrentRecord(ArchiveReader.java:192)
      4. org.archive.io.ArchiveReader.get(ArchiveReader.java:142)
      5. org.archive.io.ArchiveReader$ArchiveRecordIterator.innerNext(ArchiveReader.java:585)
      6. org.archive.io.ArchiveReader$ArchiveRecordIterator.exceptionNext(ArchiveReader.java:560)
      7. org.archive.io.ArchiveReader$ArchiveRecordIterator.next(ArchiveReader.java:528)
      8. org.archive.io.ArchiveReader$ArchiveRecordIterator.next(ArchiveReader.java:461)
      8 frames
    3. example
      WarcTest.main
      1. example.WarcTest.main(WarcTest.java:23)
      1 frame