Atlassian JIRA | Andrew Myers [Atlassian] | 7 years ago
    Vinay reports getting this OOME stack barely more than an hour after a starting a busy crawl that's run without problems in nearly identical configuration the last 3 months: java.lang.OutOfMemoryError Stacktrace: java.lang.OutOfMemoryError at Method) at at at at$CompressedStream.( at ...etc... Other info collected: 'top' line for JVM (look at VIRT: so much larger than heap and at 4GB zone): PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 12228 webcrawl 18 0 4064m 2.3g 46m S 0.0 60.3 130:38.15 /usr/lib/jdk1.6.0_01/bin/java -Dj 5 of last 6 URLs shown in crawl.log are >50MB files, 4 of 5 are compressed video (gzip stress?) Unsure why this is triggered now, but looks like this JVM issue: "Instantiating Inflater/Deflater causes OutOfMemoryError; finalizers not called promptly enough" Paradoxically, a larger heap -- by limiting the need for heap GC -- could make this problem worse, as could anything else that increases the proportion of crawler effort using gzip/native space compared to heap space. (This is a dedup crawl writing both ARCs and WARCs, so relatively lot of compression is occurring.)

    JIRA | 9 years ago | Gordon Mohr

