JIRA | Michael Stack | 10 years ago
  1. 0

    I happen to have a seed list of nearly 1024 entries. Not totally surprisingly, Heritrix behaves a little oddly with that many seeds. First, crawls with either 0.6.0 or the latest CVS build fail because too many files are opened almost immediately, and then neither socket operations nor file logging are able to proceed. A typical exception: ..... Next up, using the current CVS build, a surprising number (like, ~70) of java.util.ConcurrentModificationExceptions occurred in the first moments of the crawl (and then intermittently throughout), all with the same stack trace. An example: 20040427194255925 -5 39804 #48 124 text/html 3t java.util.ConcurrentModificationException at java.util.AbstractList$Itr.checkForComodification( at java.util.AbstractList$ at org.archive.crawler.scope.HostScope.focusAccepts( at org.archive.crawler.framework.CrawlScope.innerAccepts( at org.archive.crawler.framework.Filter.accepts( at org.archive.crawler.basic.Postselector.schedule( at org.archive.crawler.basic.Postselector.handleLinkCollection(Postselector.ja va:262) at org.archive.crawler.basic.Postselector.innerProcess( at org.archive.crawler.framework.Processor.process( at org.archive.crawler.framework.ToeThread.processCrawlUri( at Looking at the code, it looks like the CrawlScope class hands out an iterator on the scope's seeds list; that iteration needs to synchronize on the list (per the note in izedCollection(java.util.Collection) ), which I guess is going to take some refactoring. Should it be relevant, the few changes made to the default configuration for this crawl, other than adding a pile of seeds, were: - HostScope - max-link-hops 1 - total-bandwidth-usage-KB-sec 500 Otherwise, the crawl for this large seed list seems to be proceeding apace.

    JIRA | 10 years ago | Michael Stack
  2. 0

    ConcurrentModificationException, please help

    Google Groups | 1 decade ago | shoa
  3. 0

    BDB JE 3.2.76 Change Log | 2 months ago
  4. Speed up your debug routine!

    Automated exception search integrated into your IDE

  5. 0

    The Berkeley DB Java Edition Package: BDB JE 3.2.23 Change Log | 2 months ago

    3 unregistered visitors
    Not finding the right solution?
    Take a tour to get the most out of Samebug.

    Tired of useless tips?

    Automated exception search integrated into your IDE

    Root Cause Analysis

    1. java.util.ConcurrentModificationException

      No message provided

      at java.util.AbstractList$Itr.checkForComodification()
    2. Java RT
      1. java.util.AbstractList$Itr.checkForComodification(
      2. java.util.AbstractList$
      2 frames
    3. org.archive.crawler
      1. org.archive.crawler.scope.HostScope.focusAccepts(
      2. org.archive.crawler.framework.CrawlScope.innerAccepts(
      3. org.archive.crawler.framework.Filter.accepts(
      4. org.archive.crawler.basic.Postselector.schedule(
      5. org.archive.crawler.basic.Postselector.handleLinkCollection(
      6. org.archive.crawler.basic.Postselector.innerProcess(
      7. org.archive.crawler.framework.Processor.process(
      8. org.archive.crawler.framework.ToeThread.processCrawlUri(
      9 frames