java.lang.IllegalArgumentException: http://www.bbk.es%00@www.geocities.com/chero00751/bbk.html L a/@href contains a control character(s) or invalid code point: 0x1

JIRA | Gordon Mohr | 7 years ago
  1. 0

    The following alert/error occurred in the Geocities crawl. At the very least, it interrupted writing of that record -- but it may have also fouled that WARC -- leaving it in an incomplete state. Heritrix should handle this without error, writing whatever's reasonable to, discarding or fixing-up the outlink as appropriate. Aug 30, 2009 11:04:12 PM org.archive.crawler.framework.ToeThread recoverableProblem SEVERE: Problem java.lang.IllegalArgumentException: http://www.bbk.es%00@www.geocities.com/chero00751/bbk.html L a/@href contains a control character(s) or invalid code point: 0x1 occured when trying to process 'http://geocities.com/chero00751/' at step ABOUT_TO_BEGIN_PROCESSOR in WARCWriterProcessor (in thread 'ToeThread #91: http://geocities.com/chero00751/'; in processor 'WARCWriterProcessor') java.lang.IllegalArgumentException: http://www.bbk.es%00@www.geocities.com/chero00751/bbk.html L a/@href contains a control character(s) or invalid code point: 0x1 at org.archive.util.anvl.SubElement.checkControlCharacter(SubElement.java:65) at org.archive.util.anvl.Value.checkCharacter(Value.java:51) at org.archive.util.anvl.SubElement.baseCheck(SubElement.java:50) at org.archive.util.anvl.Value.baseCheck(Value.java:45) at org.archive.util.anvl.SubElement.<init>(SubElement.java:40) at org.archive.util.anvl.Value.<init>(Value.java:40) at org.archive.util.anvl.ANVLRecord.addLabelValue(ANVLRecord.java:87) at org.archive.modules.writer.WARCWriterProcessor.writeMetadata(WARCWriterProcessor.java:463) at org.archive.modules.writer.WARCWriterProcessor.write(WARCWriterProcessor.java:273) at org.archive.modules.writer.WARCWriterProcessor.innerProcessResult(WARCWriterProcessor.java:186) at org.archive.modules.Processor.process(Processor.java:138) at org.archive.crawler.framework.ToeThread.processCrawlUri(ToeThread.java:298) at org.archive.crawler.framework.ToeThread.run(ToeThread.java:152)

    JIRA | 7 years ago | Gordon Mohr
    java.lang.IllegalArgumentException: http://www.bbk.es%00@www.geocities.com/chero00751/bbk.html L a/@href contains a control character(s) or invalid code point: 0x1
  2. 0

    The following alert/error occurred in the Geocities crawl. At the very least, it interrupted writing of that record -- but it may have also fouled that WARC -- leaving it in an incomplete state. Heritrix should handle this without error, writing whatever's reasonable to, discarding or fixing-up the outlink as appropriate. Aug 30, 2009 11:04:12 PM org.archive.crawler.framework.ToeThread recoverableProblem SEVERE: Problem java.lang.IllegalArgumentException: http://www.bbk.es%00@www.geocities.com/chero00751/bbk.html L a/@href contains a control character(s) or invalid code point: 0x1 occured when trying to process 'http://geocities.com/chero00751/' at step ABOUT_TO_BEGIN_PROCESSOR in WARCWriterProcessor (in thread 'ToeThread #91: http://geocities.com/chero00751/'; in processor 'WARCWriterProcessor') java.lang.IllegalArgumentException: http://www.bbk.es%00@www.geocities.com/chero00751/bbk.html L a/@href contains a control character(s) or invalid code point: 0x1 at org.archive.util.anvl.SubElement.checkControlCharacter(SubElement.java:65) at org.archive.util.anvl.Value.checkCharacter(Value.java:51) at org.archive.util.anvl.SubElement.baseCheck(SubElement.java:50) at org.archive.util.anvl.Value.baseCheck(Value.java:45) at org.archive.util.anvl.SubElement.<init>(SubElement.java:40) at org.archive.util.anvl.Value.<init>(Value.java:40) at org.archive.util.anvl.ANVLRecord.addLabelValue(ANVLRecord.java:87) at org.archive.modules.writer.WARCWriterProcessor.writeMetadata(WARCWriterProcessor.java:463) at org.archive.modules.writer.WARCWriterProcessor.write(WARCWriterProcessor.java:273) at org.archive.modules.writer.WARCWriterProcessor.innerProcessResult(WARCWriterProcessor.java:186) at org.archive.modules.Processor.process(Processor.java:138) at org.archive.crawler.framework.ToeThread.processCrawlUri(ToeThread.java:298) at org.archive.crawler.framework.ToeThread.run(ToeThread.java:152)

    JIRA | 7 years ago | Gordon Mohr
    java.lang.IllegalArgumentException: http://www.bbk.es%00@www.geocities.com/chero00751/bbk.html L a/@href contains a control character(s) or invalid code point: 0x1
  3. 0
    Some bots are sending malformed HTTP requests to your site. Try to find their IP addresses in the access logs and ask them to fix the bots or blacklist them.
  4. Speed up your debug routine!

    Automated exception search integrated into your IDE

    Not finding the right solution?
    Take a tour to get the most out of Samebug.

    Tired of useless tips?

    Automated exception search integrated into your IDE

    Root Cause Analysis

    1. java.lang.IllegalArgumentException

      http://www.bbk.es%00@www.geocities.com/chero00751/bbk.html L a/@href contains a control character(s) or invalid code point: 0x1

      at org.archive.util.anvl.SubElement.checkControlCharacter()
    2. webarchive-commons
      ANVLRecord.addLabelValue
      1. org.archive.util.anvl.SubElement.checkControlCharacter(SubElement.java:65)
      2. org.archive.util.anvl.Value.checkCharacter(Value.java:51)
      3. org.archive.util.anvl.SubElement.baseCheck(SubElement.java:50)
      4. org.archive.util.anvl.Value.baseCheck(Value.java:45)
      5. org.archive.util.anvl.SubElement.<init>(SubElement.java:40)
      6. org.archive.util.anvl.Value.<init>(Value.java:40)
      7. org.archive.util.anvl.ANVLRecord.addLabelValue(ANVLRecord.java:87)
      7 frames
    3. org.archive.modules
      Processor.process
      1. org.archive.modules.writer.WARCWriterProcessor.writeMetadata(WARCWriterProcessor.java:463)
      2. org.archive.modules.writer.WARCWriterProcessor.write(WARCWriterProcessor.java:273)
      3. org.archive.modules.writer.WARCWriterProcessor.innerProcessResult(WARCWriterProcessor.java:186)
      4. org.archive.modules.Processor.process(Processor.java:138)
      4 frames
    4. org.archive.crawler
      ToeThread.run
      1. org.archive.crawler.framework.ToeThread.processCrawlUri(ToeThread.java:298)
      2. org.archive.crawler.framework.ToeThread.run(ToeThread.java:152)
      2 frames