java.lang.RuntimeException: After retry (Offset 218)

JIRA | Olaf Freyer | 1 decade ago
tip
Do you know that we can give you better hits? Get more relevant results from Samebug’s stack trace search.
  1. 0

    Dear IA-Team, it seems like there exists yet another issue with WARC files in Heritrix-1.12.0. I'm unable to read non-compressed WARC files with the current release. (happens either when I directly write non-compressed WARC files or when I uncompress compressed WARC files (which I were able to read prior to uncompressing them)) sh warcreader -f dump /heritrix/jobs/working3-20070319123718924/warcs/IAH-20070319123730-00002-t5.warc {content-type=text/plain, reader-identifier=/heritrix/jobs/working3-20070319123718924/warcs/IAH-20070319123730-00002-t5.warc, absolute-offset=0, subject-uri=urn:uuid:4806edc7-9244-4d70-af1d-1d6ff3ddca75, record-identifier=urn:uuid:4806edc7-9244-4d70-af1d-1d6ff3ddca75, length=216, creation-date=20070319123730, type=warcinfo, Filename=IAH-20070319123730-00002-t5.warc, version=0.10} TODO: Unimplemented 19.03.2007 13:47:27 org.archive.io.ArchiveReader$ArchiveRecordIterator hasNext WARNUNG: Trying skip of failed record cleanup of {content-type=text/plain, reader-identifier=/heritrix/jobs/working3-20070319123718924/warcs/IAH-20070319123730-00002-t5.warc, absolute-offset=0, subject-uri=urn:uuid:4806edc7-9244-4d70-af1d-1d6ff3ddca75, record-identifier=urn:uuid:4806edc7-9244-4d70-af1d-1d6ff3ddca75, length=216, creation-date=20070319123730, type=warcinfo, Filename=IAH-20070319123730-00002-t5.warc, version=0.10}: Unexpected character a(Expecting d) 19.03.2007 13:47:27 org.archive.io.ArchiveReader$ArchiveRecordIterator hasNext WARNUNG: Trying skip of failed record cleanup of {content-type=text/plain, reader-identifier=/heritrix/jobs/working3-20070319123718924/warcs/IAH-20070319123730-00002-t5.warc, absolute-offset=0, subject-uri=urn:uuid:4806edc7-9244-4d70-af1d-1d6ff3ddca75, record-identifier=urn:uuid:4806edc7-9244-4d70-af1d-1d6ff3ddca75, length=216, creation-date=20070319123730, type=warcinfo, Filename=IAH-20070319123730-00002-t5.warc, version=0.10}: Unexpected character 41(Expecting d) 19.03.2007 13:47:27 org.archive.io.ArchiveReader$ArchiveRecordIterator next WARNUNG: Bad Record. Trying skip (Current offset 218): Unexpected character 57(Expecting d) Exception processing /heritrix/jobs/working3-20070319123718924/warcs/IAH-20070319123730-00002-t5.warc: After retry (Offset 218) java.lang.RuntimeException: After retry (Offset 218) at org.archive.io.ArchiveReader$ArchiveRecordIterator.next(ArchiveReader.java:529) at org.archive.io.ArchiveReader$ArchiveRecordIterator.next(ArchiveReader.java:455) at org.archive.io.warc.v10.WARCReader.dump(WARCReader.java:106) at org.archive.io.ArchiveReader.output(ArchiveReader.java:649) at org.archive.io.warc.v10.WARCReader.output(WARCReader.java:157) at org.archive.io.warc.v10.WARCReader.main(WARCReader.java:301) Caused by: java.io.IOException: Unexpected character 52(Expecting d) at org.archive.io.warc.v10.WARCReader.readExpectedChar(WARCReader.java:82) at org.archive.io.warc.v10.WARCReader.gotoEOR(WARCReader.java:72) at org.archive.io.ArchiveReader.cleanupCurrentRecord(ArchiveReader.java:192) at org.archive.io.ArchiveReader.get(ArchiveReader.java:142) at org.archive.io.ArchiveReader$ArchiveRecordIterator.innerNext(ArchiveReader.java:579) at org.archive.io.ArchiveReader$ArchiveRecordIterator.exceptionNext(ArchiveReader.java:554) at org.archive.io.ArchiveReader$ArchiveRecordIterator.next(ArchiveReader.java:522) ... 5 more Basically the same issue exists for the v12 WARCReader, too... Regards Olaf Freyer

    JIRA | 1 decade ago | Olaf Freyer
    java.lang.RuntimeException: After retry (Offset 218)
  2. 0

    Dear IA-Team, it seems like there exists yet another issue with WARC files in Heritrix-1.12.0. I'm unable to read non-compressed WARC files with the current release. (happens either when I directly write non-compressed WARC files or when I uncompress compressed WARC files (which I were able to read prior to uncompressing them)) sh warcreader -f dump /heritrix/jobs/working3-20070319123718924/warcs/IAH-20070319123730-00002-t5.warc {content-type=text/plain, reader-identifier=/heritrix/jobs/working3-20070319123718924/warcs/IAH-20070319123730-00002-t5.warc, absolute-offset=0, subject-uri=urn:uuid:4806edc7-9244-4d70-af1d-1d6ff3ddca75, record-identifier=urn:uuid:4806edc7-9244-4d70-af1d-1d6ff3ddca75, length=216, creation-date=20070319123730, type=warcinfo, Filename=IAH-20070319123730-00002-t5.warc, version=0.10} TODO: Unimplemented 19.03.2007 13:47:27 org.archive.io.ArchiveReader$ArchiveRecordIterator hasNext WARNUNG: Trying skip of failed record cleanup of {content-type=text/plain, reader-identifier=/heritrix/jobs/working3-20070319123718924/warcs/IAH-20070319123730-00002-t5.warc, absolute-offset=0, subject-uri=urn:uuid:4806edc7-9244-4d70-af1d-1d6ff3ddca75, record-identifier=urn:uuid:4806edc7-9244-4d70-af1d-1d6ff3ddca75, length=216, creation-date=20070319123730, type=warcinfo, Filename=IAH-20070319123730-00002-t5.warc, version=0.10}: Unexpected character a(Expecting d) 19.03.2007 13:47:27 org.archive.io.ArchiveReader$ArchiveRecordIterator hasNext WARNUNG: Trying skip of failed record cleanup of {content-type=text/plain, reader-identifier=/heritrix/jobs/working3-20070319123718924/warcs/IAH-20070319123730-00002-t5.warc, absolute-offset=0, subject-uri=urn:uuid:4806edc7-9244-4d70-af1d-1d6ff3ddca75, record-identifier=urn:uuid:4806edc7-9244-4d70-af1d-1d6ff3ddca75, length=216, creation-date=20070319123730, type=warcinfo, Filename=IAH-20070319123730-00002-t5.warc, version=0.10}: Unexpected character 41(Expecting d) 19.03.2007 13:47:27 org.archive.io.ArchiveReader$ArchiveRecordIterator next WARNUNG: Bad Record. Trying skip (Current offset 218): Unexpected character 57(Expecting d) Exception processing /heritrix/jobs/working3-20070319123718924/warcs/IAH-20070319123730-00002-t5.warc: After retry (Offset 218) java.lang.RuntimeException: After retry (Offset 218) at org.archive.io.ArchiveReader$ArchiveRecordIterator.next(ArchiveReader.java:529) at org.archive.io.ArchiveReader$ArchiveRecordIterator.next(ArchiveReader.java:455) at org.archive.io.warc.v10.WARCReader.dump(WARCReader.java:106) at org.archive.io.ArchiveReader.output(ArchiveReader.java:649) at org.archive.io.warc.v10.WARCReader.output(WARCReader.java:157) at org.archive.io.warc.v10.WARCReader.main(WARCReader.java:301) Caused by: java.io.IOException: Unexpected character 52(Expecting d) at org.archive.io.warc.v10.WARCReader.readExpectedChar(WARCReader.java:82) at org.archive.io.warc.v10.WARCReader.gotoEOR(WARCReader.java:72) at org.archive.io.ArchiveReader.cleanupCurrentRecord(ArchiveReader.java:192) at org.archive.io.ArchiveReader.get(ArchiveReader.java:142) at org.archive.io.ArchiveReader$ArchiveRecordIterator.innerNext(ArchiveReader.java:579) at org.archive.io.ArchiveReader$ArchiveRecordIterator.exceptionNext(ArchiveReader.java:554) at org.archive.io.ArchiveReader$ArchiveRecordIterator.next(ArchiveReader.java:522) ... 5 more Basically the same issue exists for the v12 WARCReader, too... Regards Olaf Freyer

    JIRA | 1 decade ago | Olaf Freyer
    java.lang.RuntimeException: After retry (Offset 218)

    Root Cause Analysis

    1. java.io.IOException

      Unexpected character 52(Expecting d)

      at org.archive.io.warc.v10.WARCReader.readExpectedChar()
    2. org.archive.io
      WARCReader.gotoEOR
      1. org.archive.io.warc.v10.WARCReader.readExpectedChar(WARCReader.java:82)
      2. org.archive.io.warc.v10.WARCReader.gotoEOR(WARCReader.java:72)
      2 frames
    3. webarchive-commons
      ArchiveReader$ArchiveRecordIterator.next
      1. org.archive.io.ArchiveReader.cleanupCurrentRecord(ArchiveReader.java:192)
      2. org.archive.io.ArchiveReader.get(ArchiveReader.java:142)
      3. org.archive.io.ArchiveReader$ArchiveRecordIterator.innerNext(ArchiveReader.java:579)
      4. org.archive.io.ArchiveReader$ArchiveRecordIterator.exceptionNext(ArchiveReader.java:554)
      5. org.archive.io.ArchiveReader$ArchiveRecordIterator.next(ArchiveReader.java:522)
      6. org.archive.io.ArchiveReader$ArchiveRecordIterator.next(ArchiveReader.java:455)
      6 frames
    4. org.archive.io
      WARCReader.dump
      1. org.archive.io.warc.v10.WARCReader.dump(WARCReader.java:106)
      1 frame
    5. webarchive-commons
      ArchiveReader.output
      1. org.archive.io.ArchiveReader.output(ArchiveReader.java:649)
      1 frame
    6. org.archive.io
      WARCReader.main
      1. org.archive.io.warc.v10.WARCReader.output(WARCReader.java:157)
      2. org.archive.io.warc.v10.WARCReader.main(WARCReader.java:301)
      2 frames