org.apache.commons.httpclient.URIException: http scheme specific part is too short: //

JIRA | Gordon Mohr | 10 years ago
tip
Click on the to mark the solution that helps you, Samebug will learn from it.
As a community member, you’ll be rewarded for you help.
  1. 0

    A log recovery has a lot of the following kinds of exceptions: 09/29/2007 08:33:55 +0000 WARNING org.archive.crawler.frontier.RecoveryJournal importQueuesFromLog bad URI during log -recovery of queue contents org.apache.commons.httpclient.URIException: http scheme specific part is too short: // at org.archive.net.UURIFactory.checkHttpSchemeSpecificPartSlashPrefix(UURIFactory.java:574) at org.archive.net.UURIFactory.fixup(UURIFactory.java:462) at org.archive.net.UURIFactory.create(UURIFactory.java:322) at org.archive.net.UURIFactory.create(UURIFactory.java:312) at org.archive.net.UURIFactory.getInstance(UURIFactory.java:265) at org.archive.crawler.frontier.RecoveryJournal.importQueuesFromLog(RecoveryJournal.java:325) at org.archive.crawler.frontier.RecoveryJournal.access$000(RecoveryJournal.java:61) at org.archive.crawler.frontier.RecoveryJournal$1.run(RecoveryJournal.java:174) at java.lang.Thread.run(Thread.java:619) 09/29/2007 08:33:55 +0000 WARNING org.archive.crawler.frontier.RecoveryJournal importQueuesFromLog bad URI during log -recovery of queue contents org.apache.commons.httpclient.URIException: gnu.inet.encoding.IDNAException: Contains non-LDH characters. $1 at org.archive.net.UURIFactory.fixupDomainlabel(UURIFactory.java:655) at org.archive.net.UURIFactory.fixupAuthority(UURIFactory.java:606) at org.archive.net.UURIFactory.fixup(UURIFactory.java:468) at org.archive.net.UURIFactory.create(UURIFactory.java:322) at org.archive.net.UURIFactory.create(UURIFactory.java:312) at org.archive.net.UURIFactory.getInstance(UURIFactory.java:265) at org.archive.crawler.frontier.RecoveryJournal.importQueuesFromLog(RecoveryJournal.java:325) at org.archive.crawler.frontier.RecoveryJournal.access$000(RecoveryJournal.java:61) at org.archive.crawler.frontier.RecoveryJournal$1.run(RecoveryJournal.java:174) at java.lang.Thread.run(Thread.java:619) Also: 09/29/2007 08:33:55 +0000 WARNING org.archive.crawler.frontier.RecoveryJournal importQueuesFromLog bad URI during log -recovery of queue contents org.apache.commons.httpclient.URIException: gnu.inet.encoding.IDNAException: Contains non-LDH characters. $1' 09/29/2007 08:33:55 +0000 WARNING org.archive.crawler.frontier.RecoveryJournal importQueuesFromLog bad URI during log -recovery of queue contents org.apache.commons.httpclient.URIException: gnu.inet.encoding.IDNAException: Contains non-LDH characters. $langue.wik ipedia.org 09/29/2007 08:33:55 +0000 WARNING org.archive.crawler.frontier.RecoveryJournal importQueuesFromLog bad URI during log -recovery of queue contents org.apache.commons.httpclient.URIException: gnu.inet.encoding.IDNAException: String too long. .... ...and so forth. In this case, they may be due to the recovery-log having been synthesized with bad URIs, but similar exceptions have been seen from a real recovery-log. Two issues: (1) If these come from a real recovery-log, no URI should have been written out that can't be read back in; (2) If from a synth recovery-log, perhaps the error should be more helpful (full origin line) but also less alarming (just log that the bad line has been ignored)

    JIRA | 10 years ago | Gordon Mohr
    org.apache.commons.httpclient.URIException: http scheme specific part is too short: //
  2. 0

    A log recovery has a lot of the following kinds of exceptions: 09/29/2007 08:33:55 +0000 WARNING org.archive.crawler.frontier.RecoveryJournal importQueuesFromLog bad URI during log -recovery of queue contents org.apache.commons.httpclient.URIException: http scheme specific part is too short: // at org.archive.net.UURIFactory.checkHttpSchemeSpecificPartSlashPrefix(UURIFactory.java:574) at org.archive.net.UURIFactory.fixup(UURIFactory.java:462) at org.archive.net.UURIFactory.create(UURIFactory.java:322) at org.archive.net.UURIFactory.create(UURIFactory.java:312) at org.archive.net.UURIFactory.getInstance(UURIFactory.java:265) at org.archive.crawler.frontier.RecoveryJournal.importQueuesFromLog(RecoveryJournal.java:325) at org.archive.crawler.frontier.RecoveryJournal.access$000(RecoveryJournal.java:61) at org.archive.crawler.frontier.RecoveryJournal$1.run(RecoveryJournal.java:174) at java.lang.Thread.run(Thread.java:619) 09/29/2007 08:33:55 +0000 WARNING org.archive.crawler.frontier.RecoveryJournal importQueuesFromLog bad URI during log -recovery of queue contents org.apache.commons.httpclient.URIException: gnu.inet.encoding.IDNAException: Contains non-LDH characters. $1 at org.archive.net.UURIFactory.fixupDomainlabel(UURIFactory.java:655) at org.archive.net.UURIFactory.fixupAuthority(UURIFactory.java:606) at org.archive.net.UURIFactory.fixup(UURIFactory.java:468) at org.archive.net.UURIFactory.create(UURIFactory.java:322) at org.archive.net.UURIFactory.create(UURIFactory.java:312) at org.archive.net.UURIFactory.getInstance(UURIFactory.java:265) at org.archive.crawler.frontier.RecoveryJournal.importQueuesFromLog(RecoveryJournal.java:325) at org.archive.crawler.frontier.RecoveryJournal.access$000(RecoveryJournal.java:61) at org.archive.crawler.frontier.RecoveryJournal$1.run(RecoveryJournal.java:174) at java.lang.Thread.run(Thread.java:619) Also: 09/29/2007 08:33:55 +0000 WARNING org.archive.crawler.frontier.RecoveryJournal importQueuesFromLog bad URI during log -recovery of queue contents org.apache.commons.httpclient.URIException: gnu.inet.encoding.IDNAException: Contains non-LDH characters. $1' 09/29/2007 08:33:55 +0000 WARNING org.archive.crawler.frontier.RecoveryJournal importQueuesFromLog bad URI during log -recovery of queue contents org.apache.commons.httpclient.URIException: gnu.inet.encoding.IDNAException: Contains non-LDH characters. $langue.wik ipedia.org 09/29/2007 08:33:55 +0000 WARNING org.archive.crawler.frontier.RecoveryJournal importQueuesFromLog bad URI during log -recovery of queue contents org.apache.commons.httpclient.URIException: gnu.inet.encoding.IDNAException: String too long. .... ...and so forth. In this case, they may be due to the recovery-log having been synthesized with bad URIs, but similar exceptions have been seen from a real recovery-log. Two issues: (1) If these come from a real recovery-log, no URI should have been written out that can't be read back in; (2) If from a synth recovery-log, perhaps the error should be more helpful (full origin line) but also less alarming (just log that the bad line has been ignored)

    JIRA | 10 years ago | Gordon Mohr
    org.apache.commons.httpclient.URIException: http scheme specific part is too short: //

    Root Cause Analysis

    1. org.apache.commons.httpclient.URIException

      http scheme specific part is too short: //

      at org.archive.net.UURIFactory.checkHttpSchemeSpecificPartSlashPrefix()
    2. webarchive-commons
      UURIFactory.getInstance
      1. org.archive.net.UURIFactory.checkHttpSchemeSpecificPartSlashPrefix(UURIFactory.java:574)
      2. org.archive.net.UURIFactory.fixup(UURIFactory.java:462)
      3. org.archive.net.UURIFactory.create(UURIFactory.java:322)
      4. org.archive.net.UURIFactory.create(UURIFactory.java:312)
      5. org.archive.net.UURIFactory.getInstance(UURIFactory.java:265)
      5 frames
    3. org.archive.crawler
      RecoveryJournal$1.run
      1. org.archive.crawler.frontier.RecoveryJournal.importQueuesFromLog(RecoveryJournal.java:325)
      2. org.archive.crawler.frontier.RecoveryJournal.access$000(RecoveryJournal.java:61)
      3. org.archive.crawler.frontier.RecoveryJournal$1.run(RecoveryJournal.java:174)
      3 frames
    4. Java RT
      Thread.run
      1. java.lang.Thread.run(Thread.java:619)
      1 frame