java.lang.RuntimeException: After retry (Offset 296)

JIRA | Shaul Kushelevsky | 7 years ago
  1. 0

    Here is my code: public class WarcTest { public static void main(String[] args) { String warcPath = "c:\\temp\\pipeData\\warc\\test\\11.warc"; WARCReader warcReader; try { warcReader = WARCReaderFactory.get(new File(warcPath)); Iterator<ArchiveRecord> it = warcReader.iterator(); while (it.hasNext()) { ArchiveRecord record = it.next(); record.dump(); } } catch (IOException e) { e.printStackTrace(); } } } And here is the warc file: WARC/0.18 WARC-Type: warcinfo WARC-Date: 2009-07-15T06:56:30Z WARC-Filename: 1.warc WARC-Record-ID: <urn:uuid:f496e1f2-b96c-45f0-9f43-a1385c7b0939> Content-Type: application/warc-fields Content-Length: 80 Content-Description: Made from C:\temp\1.arc by org.archive.io.Arc2Warc/5800 WARC/0.18 WARC-Type: resource WARC-Target-URI: http://vteoria.com/dif/warc_shaul.html WARC-Date: 20090715065512 IP-Address: 72.167.131.216 WARC-Record-ID: <urn:uuid:63d42fd5-0fae-4a15-81a2-1443e394b449> Content-Type: application/http; msgtype=response Content-Length: 673 HTTP/1.1 200 OK Date: Wed, 15 Jul 2009 06:55:12 GMT Server: Apache Connection: close Content-Type: text/html <HTML> <HEAD> <TITLE>Your Title Here</TITLE> </HEAD> <BODY BGCOLOR="FFFFFF"> <HR> <a href="http://somegreatsite.com">Link Name</a> is a link to another nifty site <H1>This is a Header</H1> <H2>This is a Medium Header</H2> Send me mail at <a href="mailto:support@yourcompany.com"> support@yourcompany.com</a>. <P> This is a new paragraph! <P> <B>This is a new paragraph!</B> <BR> <B><I>This is a new sentence without a paragraph break, in bold italics.</I></B> <HR> </BODY> </HTML> The errors that i get is: Content-Description: Made from C:\temp\1.arc by org.archive.io.Arc2Warc/5800 Jul 27, 2009 1:17:14 PM org.archive.io.ArchiveReader$ArchiveRecordIterator hasNext WARNING: Trying skip of failed record cleanup of {WARC-Type=warcinfo, WARC-Filename=1.warc, reader-identifier=c:\temp\pipeData\warc\test\11.warc, WARC-Date=2009-07-15T06:56:30Z, absolute-offset=0, Content-Length=80, WARC-Record-ID=<urn:uuid:f496e1f2-b96c-45f0-9f43-a1385c7b0939>, Content-Type=application/warc-fields}: Unexpected character a(Expecting d) Jul 27, 2009 1:17:14 PM org.archive.io.ArchiveReader$ArchiveRecordIterator hasNext WARNING: Trying skip of failed record cleanup of {WARC-Type=warcinfo, WARC-Filename=1.warc, reader-identifier=c:\temp\pipeData\warc\test\11.warc, WARC-Date=2009-07-15T06:56:30Z, absolute-offset=0, Content-Length=80, WARC-Record-ID=<urn:uuid:f496e1f2-b96c-45f0-9f43-a1385c7b0939>, Content-Type=application/warc-fields}: Unexpected character 41(Expecting d) Jul 27, 2009 1:17:14 PM org.archive.io.ArchiveReader$ArchiveRecordIterator next WARNING: Bad Record. Trying skip (Current offset 296): Unexpected character 57(Expecting d) Exception in thread "main" java.lang.RuntimeException: After retry (Offset 296) at org.archive.io.ArchiveReader$ArchiveRecordIterator.next(ArchiveReader.java:535) at org.archive.io.ArchiveReader$ArchiveRecordIterator.next(ArchiveReader.java:461) at example.WarcTest.main(WarcTest.java:23) Caused by: java.io.IOException: Unexpected character 52(Expecting d) at org.archive.io.warc.WARCReader.readExpectedChar(WARCReader.java:82) at org.archive.io.warc.WARCReader.gotoEOR(WARCReader.java:72) at org.archive.io.ArchiveReader.cleanupCurrentRecord(ArchiveReader.java:192) at org.archive.io.ArchiveReader.get(ArchiveReader.java:142) at org.archive.io.ArchiveReader$ArchiveRecordIterator.innerNext(ArchiveReader.java:585) at org.archive.io.ArchiveReader$ArchiveRecordIterator.exceptionNext(ArchiveReader.java:560) at org.archive.io.ArchiveReader$ArchiveRecordIterator.next(ArchiveReader.java:528) ... 2 more I think I solved this by modifying the class ArchivedRecord and changing the "read" method to be public int read(byte[] b, int offset, int length) throws IOException { int read = Math.min(length, available()); if (read != -1 && read != 0) { read = this.in.read(b, offset, read); if (read == -1) { String msg = "Premature EOF before end-of-record: " + getHeader().getHeaderFields(); if (isStrict()) { throw new IOException(msg); } setEor(true); System.err.println(Level.WARNING.toString() + " " + msg); } if (this.digest != null && read >= 0) { this.digest.update(b, offset, read); } } /* * Shaul K. set the read to -1 only after the actual increment is done. */ incrementPosition(read); if (read == -1 || read == 0) { read = -1; } return read; } Can you verify? otherwise, what am i doing wrong?

    JIRA | 7 years ago | Shaul Kushelevsky
    java.lang.RuntimeException: After retry (Offset 296)
  2. 0

    Here is my code: public class WarcTest { public static void main(String[] args) { String warcPath = "c:\\temp\\pipeData\\warc\\test\\11.warc"; WARCReader warcReader; try { warcReader = WARCReaderFactory.get(new File(warcPath)); Iterator<ArchiveRecord> it = warcReader.iterator(); while (it.hasNext()) { ArchiveRecord record = it.next(); record.dump(); } } catch (IOException e) { e.printStackTrace(); } } } And here is the warc file: WARC/0.18 WARC-Type: warcinfo WARC-Date: 2009-07-15T06:56:30Z WARC-Filename: 1.warc WARC-Record-ID: <urn:uuid:f496e1f2-b96c-45f0-9f43-a1385c7b0939> Content-Type: application/warc-fields Content-Length: 80 Content-Description: Made from C:\temp\1.arc by org.archive.io.Arc2Warc/5800 WARC/0.18 WARC-Type: resource WARC-Target-URI: http://vteoria.com/dif/warc_shaul.html WARC-Date: 20090715065512 IP-Address: 72.167.131.216 WARC-Record-ID: <urn:uuid:63d42fd5-0fae-4a15-81a2-1443e394b449> Content-Type: application/http; msgtype=response Content-Length: 673 HTTP/1.1 200 OK Date: Wed, 15 Jul 2009 06:55:12 GMT Server: Apache Connection: close Content-Type: text/html <HTML> <HEAD> <TITLE>Your Title Here</TITLE> </HEAD> <BODY BGCOLOR="FFFFFF"> <HR> <a href="http://somegreatsite.com">Link Name</a> is a link to another nifty site <H1>This is a Header</H1> <H2>This is a Medium Header</H2> Send me mail at <a href="mailto:support@yourcompany.com"> support@yourcompany.com</a>. <P> This is a new paragraph! <P> <B>This is a new paragraph!</B> <BR> <B><I>This is a new sentence without a paragraph break, in bold italics.</I></B> <HR> </BODY> </HTML> The errors that i get is: Content-Description: Made from C:\temp\1.arc by org.archive.io.Arc2Warc/5800 Jul 27, 2009 1:17:14 PM org.archive.io.ArchiveReader$ArchiveRecordIterator hasNext WARNING: Trying skip of failed record cleanup of {WARC-Type=warcinfo, WARC-Filename=1.warc, reader-identifier=c:\temp\pipeData\warc\test\11.warc, WARC-Date=2009-07-15T06:56:30Z, absolute-offset=0, Content-Length=80, WARC-Record-ID=<urn:uuid:f496e1f2-b96c-45f0-9f43-a1385c7b0939>, Content-Type=application/warc-fields}: Unexpected character a(Expecting d) Jul 27, 2009 1:17:14 PM org.archive.io.ArchiveReader$ArchiveRecordIterator hasNext WARNING: Trying skip of failed record cleanup of {WARC-Type=warcinfo, WARC-Filename=1.warc, reader-identifier=c:\temp\pipeData\warc\test\11.warc, WARC-Date=2009-07-15T06:56:30Z, absolute-offset=0, Content-Length=80, WARC-Record-ID=<urn:uuid:f496e1f2-b96c-45f0-9f43-a1385c7b0939>, Content-Type=application/warc-fields}: Unexpected character 41(Expecting d) Jul 27, 2009 1:17:14 PM org.archive.io.ArchiveReader$ArchiveRecordIterator next WARNING: Bad Record. Trying skip (Current offset 296): Unexpected character 57(Expecting d) Exception in thread "main" java.lang.RuntimeException: After retry (Offset 296) at org.archive.io.ArchiveReader$ArchiveRecordIterator.next(ArchiveReader.java:535) at org.archive.io.ArchiveReader$ArchiveRecordIterator.next(ArchiveReader.java:461) at example.WarcTest.main(WarcTest.java:23) Caused by: java.io.IOException: Unexpected character 52(Expecting d) at org.archive.io.warc.WARCReader.readExpectedChar(WARCReader.java:82) at org.archive.io.warc.WARCReader.gotoEOR(WARCReader.java:72) at org.archive.io.ArchiveReader.cleanupCurrentRecord(ArchiveReader.java:192) at org.archive.io.ArchiveReader.get(ArchiveReader.java:142) at org.archive.io.ArchiveReader$ArchiveRecordIterator.innerNext(ArchiveReader.java:585) at org.archive.io.ArchiveReader$ArchiveRecordIterator.exceptionNext(ArchiveReader.java:560) at org.archive.io.ArchiveReader$ArchiveRecordIterator.next(ArchiveReader.java:528) ... 2 more I think I solved this by modifying the class ArchivedRecord and changing the "read" method to be public int read(byte[] b, int offset, int length) throws IOException { int read = Math.min(length, available()); if (read != -1 && read != 0) { read = this.in.read(b, offset, read); if (read == -1) { String msg = "Premature EOF before end-of-record: " + getHeader().getHeaderFields(); if (isStrict()) { throw new IOException(msg); } setEor(true); System.err.println(Level.WARNING.toString() + " " + msg); } if (this.digest != null && read >= 0) { this.digest.update(b, offset, read); } } /* * Shaul K. set the read to -1 only after the actual increment is done. */ incrementPosition(read); if (read == -1 || read == 0) { read = -1; } return read; } Can you verify? otherwise, what am i doing wrong?

    JIRA | 7 years ago | Shaul Kushelevsky
    java.lang.RuntimeException: After retry (Offset 296)
  3. 0

    Tika process exits and skips rest of arc-file parsing.

    GitHub | 2 years ago | thomasegense
    java.lang.RuntimeException: After retry (Offset 29424030)
  4. Speed up your debug routine!

    Automated exception search integrated into your IDE

  5. 0

    GitHub comment 17#71220467

    GitHub | 2 years ago | machawk1
    java.lang.RuntimeException: After retry (Offset 94220)
  6. 0

    Dear IA-Team, it seems like there exists yet another issue with WARC files in Heritrix-1.12.0. I'm unable to read non-compressed WARC files with the current release. (happens either when I directly write non-compressed WARC files or when I uncompress compressed WARC files (which I were able to read prior to uncompressing them)) sh warcreader -f dump /heritrix/jobs/working3-20070319123718924/warcs/IAH-20070319123730-00002-t5.warc {content-type=text/plain, reader-identifier=/heritrix/jobs/working3-20070319123718924/warcs/IAH-20070319123730-00002-t5.warc, absolute-offset=0, subject-uri=urn:uuid:4806edc7-9244-4d70-af1d-1d6ff3ddca75, record-identifier=urn:uuid:4806edc7-9244-4d70-af1d-1d6ff3ddca75, length=216, creation-date=20070319123730, type=warcinfo, Filename=IAH-20070319123730-00002-t5.warc, version=0.10} TODO: Unimplemented 19.03.2007 13:47:27 org.archive.io.ArchiveReader$ArchiveRecordIterator hasNext WARNUNG: Trying skip of failed record cleanup of {content-type=text/plain, reader-identifier=/heritrix/jobs/working3-20070319123718924/warcs/IAH-20070319123730-00002-t5.warc, absolute-offset=0, subject-uri=urn:uuid:4806edc7-9244-4d70-af1d-1d6ff3ddca75, record-identifier=urn:uuid:4806edc7-9244-4d70-af1d-1d6ff3ddca75, length=216, creation-date=20070319123730, type=warcinfo, Filename=IAH-20070319123730-00002-t5.warc, version=0.10}: Unexpected character a(Expecting d) 19.03.2007 13:47:27 org.archive.io.ArchiveReader$ArchiveRecordIterator hasNext WARNUNG: Trying skip of failed record cleanup of {content-type=text/plain, reader-identifier=/heritrix/jobs/working3-20070319123718924/warcs/IAH-20070319123730-00002-t5.warc, absolute-offset=0, subject-uri=urn:uuid:4806edc7-9244-4d70-af1d-1d6ff3ddca75, record-identifier=urn:uuid:4806edc7-9244-4d70-af1d-1d6ff3ddca75, length=216, creation-date=20070319123730, type=warcinfo, Filename=IAH-20070319123730-00002-t5.warc, version=0.10}: Unexpected character 41(Expecting d) 19.03.2007 13:47:27 org.archive.io.ArchiveReader$ArchiveRecordIterator next WARNUNG: Bad Record. Trying skip (Current offset 218): Unexpected character 57(Expecting d) Exception processing /heritrix/jobs/working3-20070319123718924/warcs/IAH-20070319123730-00002-t5.warc: After retry (Offset 218) java.lang.RuntimeException: After retry (Offset 218) at org.archive.io.ArchiveReader$ArchiveRecordIterator.next(ArchiveReader.java:529) at org.archive.io.ArchiveReader$ArchiveRecordIterator.next(ArchiveReader.java:455) at org.archive.io.warc.v10.WARCReader.dump(WARCReader.java:106) at org.archive.io.ArchiveReader.output(ArchiveReader.java:649) at org.archive.io.warc.v10.WARCReader.output(WARCReader.java:157) at org.archive.io.warc.v10.WARCReader.main(WARCReader.java:301) Caused by: java.io.IOException: Unexpected character 52(Expecting d) at org.archive.io.warc.v10.WARCReader.readExpectedChar(WARCReader.java:82) at org.archive.io.warc.v10.WARCReader.gotoEOR(WARCReader.java:72) at org.archive.io.ArchiveReader.cleanupCurrentRecord(ArchiveReader.java:192) at org.archive.io.ArchiveReader.get(ArchiveReader.java:142) at org.archive.io.ArchiveReader$ArchiveRecordIterator.innerNext(ArchiveReader.java:579) at org.archive.io.ArchiveReader$ArchiveRecordIterator.exceptionNext(ArchiveReader.java:554) at org.archive.io.ArchiveReader$ArchiveRecordIterator.next(ArchiveReader.java:522) ... 5 more Basically the same issue exists for the v12 WARCReader, too... Regards Olaf Freyer

    JIRA | 10 years ago | Olaf Freyer
    java.lang.RuntimeException: After retry (Offset 218)

    Not finding the right solution?
    Take a tour to get the most out of Samebug.

    Tired of useless tips?

    Automated exception search integrated into your IDE

    Root Cause Analysis

    1. java.io.IOException

      Unexpected character 52(Expecting d)

      at org.archive.io.warc.WARCReader.readExpectedChar()
    2. webarchive-commons
      ArchiveReader$ArchiveRecordIterator.next
      1. org.archive.io.warc.WARCReader.readExpectedChar(WARCReader.java:82)
      2. org.archive.io.warc.WARCReader.gotoEOR(WARCReader.java:72)
      3. org.archive.io.ArchiveReader.cleanupCurrentRecord(ArchiveReader.java:192)
      4. org.archive.io.ArchiveReader.get(ArchiveReader.java:142)
      5. org.archive.io.ArchiveReader$ArchiveRecordIterator.innerNext(ArchiveReader.java:585)
      6. org.archive.io.ArchiveReader$ArchiveRecordIterator.exceptionNext(ArchiveReader.java:560)
      7. org.archive.io.ArchiveReader$ArchiveRecordIterator.next(ArchiveReader.java:528)
      8. org.archive.io.ArchiveReader$ArchiveRecordIterator.next(ArchiveReader.java:461)
      8 frames
    3. example
      WarcTest.main
      1. example.WarcTest.main(WarcTest.java:23)
      1 frame