java.io.FileNotFoundException: /heritrix/jobs/h22-20070315210752398/scratch/tt104http.ris (Too many open files)

JIRA | Olaf Freyer | 10 years ago
  1. 0

    With current Heritrix-1.12 I seem to repeatedly get those: (only seems to happens with .pdf documents so far, but I mid-fetch abort on anything but text/html and application/pdf anyways) 03/15/2007 21:44:28 +0000 SCHWERWIEGEND org.archive.crawler.writer.ExperimentalV10WARCWriterProcessor innerProcess Failed write of Record: http://www.pandacom.de/produkte/hersteller-highlights/Witcom_spec_witview.pdf java.io.FileNotFoundException: /heritrix/jobs/h22-20070315210752398/scratch/tt104http.ris (Too many open files) at java.io.RandomAccessFile.open(Native Method) at java.io.RandomAccessFile.<init>(RandomAccessFile.java:212) at org.archive.io.RandomAccessInputStream.<init>(RandomAccessInputStream.java:79) at org.archive.io.ReplayInputStream.<init>(ReplayInputStream.java:97) at org.archive.io.ReplayInputStream.<init>(ReplayInputStream.java:76) at org.archive.io.RecordingOutputStream.getReplayInputStream(RecordingOutputStream.java:356) at org.archive.io.RecordingOutputStream.getReplayInputStream(RecordingOutputStream.java:348) at org.archive.io.RecordingInputStream.getReplayInputStream(RecordingInputStream.java:150) at org.archive.crawler.writer.ExperimentalV10WARCWriterProcessor.writeResponse(ExperimentalV10WARCWriterProcessor.java:219) at org.archive.crawler.writer.ExperimentalV10WARCWriterProcessor.write(ExperimentalV10WARCWriterProcessor.java:164) at org.archive.crawler.writer.ExperimentalV10WARCWriterProcessor.innerProcess(ExperimentalV10WARCWriterProcessor.java:116) at org.archive.crawler.framework.Processor.process(Processor.java:109) at org.archive.crawler.framework.ToeThread.processCrawlUri(ToeThread.java:302) at org.archive.crawler.framework.ToeThread.run(ToeThread.java:151) This issue simply renders my spider unusable as after this failure, the Writers do get "too many open files" very soon, too. I use the same setup that I used for Heritrix 1.10.2, I adapted my config to do exactly the same as it did before the version switch. This happened to me twice so far, so the issue seems reproducable... I presume there is some file descriptor leak somewhere. Regards Olaf Freyer P.S.: if it will happen a third time to me, I'll try to hand over order.xml and seeds.txt to one of you guys to try to reproduce it, too.

    JIRA | 10 years ago | Olaf Freyer
    java.io.FileNotFoundException: /heritrix/jobs/h22-20070315210752398/scratch/tt104http.ris (Too many open files)
  2. 0

    With current Heritrix-1.12 I seem to repeatedly get those: (only seems to happens with .pdf documents so far, but I mid-fetch abort on anything but text/html and application/pdf anyways) 03/15/2007 21:44:28 +0000 SCHWERWIEGEND org.archive.crawler.writer.ExperimentalV10WARCWriterProcessor innerProcess Failed write of Record: http://www.pandacom.de/produkte/hersteller-highlights/Witcom_spec_witview.pdf java.io.FileNotFoundException: /heritrix/jobs/h22-20070315210752398/scratch/tt104http.ris (Too many open files) at java.io.RandomAccessFile.open(Native Method) at java.io.RandomAccessFile.<init>(RandomAccessFile.java:212) at org.archive.io.RandomAccessInputStream.<init>(RandomAccessInputStream.java:79) at org.archive.io.ReplayInputStream.<init>(ReplayInputStream.java:97) at org.archive.io.ReplayInputStream.<init>(ReplayInputStream.java:76) at org.archive.io.RecordingOutputStream.getReplayInputStream(RecordingOutputStream.java:356) at org.archive.io.RecordingOutputStream.getReplayInputStream(RecordingOutputStream.java:348) at org.archive.io.RecordingInputStream.getReplayInputStream(RecordingInputStream.java:150) at org.archive.crawler.writer.ExperimentalV10WARCWriterProcessor.writeResponse(ExperimentalV10WARCWriterProcessor.java:219) at org.archive.crawler.writer.ExperimentalV10WARCWriterProcessor.write(ExperimentalV10WARCWriterProcessor.java:164) at org.archive.crawler.writer.ExperimentalV10WARCWriterProcessor.innerProcess(ExperimentalV10WARCWriterProcessor.java:116) at org.archive.crawler.framework.Processor.process(Processor.java:109) at org.archive.crawler.framework.ToeThread.processCrawlUri(ToeThread.java:302) at org.archive.crawler.framework.ToeThread.run(ToeThread.java:151) This issue simply renders my spider unusable as after this failure, the Writers do get "too many open files" very soon, too. I use the same setup that I used for Heritrix 1.10.2, I adapted my config to do exactly the same as it did before the version switch. This happened to me twice so far, so the issue seems reproducable... I presume there is some file descriptor leak somewhere. Regards Olaf Freyer P.S.: if it will happen a third time to me, I'll try to hand over order.xml and seeds.txt to one of you guys to try to reproduce it, too.

    JIRA | 10 years ago | Olaf Freyer
    java.io.FileNotFoundException: /heritrix/jobs/h22-20070315210752398/scratch/tt104http.ris (Too many open files)
  3. 0

    Search failing when cluster busy

    GitHub | 7 years ago | clintongormley
    org.elasticsearch.index.gateway.IndexShardGatewaySnapshotFailedException: [ia_object_1270046679][0] Failed to append snapshot translog into [/opt/elasticsearch/data/iAnnounce/ia_object_1270046679/0/translog/translog-3]
  4. Speed up your debug routine!

    Automated exception search integrated into your IDE

  5. 0

    java.io.IOException: Invalid argument and Too many open files

    Stack Overflow | 6 years ago | purple
    java.io.FileNotFoundException: /root/TorrentStealer/downloads/MAME - Update ROMs (v0.141 to v0.141u2)/lah_l104.zip (Too many open files)

  1. Andreas Häber 1 times, last 2 weeks ago
  2. tyson925 5 times, last 7 months ago
  3. rp 13 times, last 8 months ago
30 unregistered visitors
Not finding the right solution?
Take a tour to get the most out of Samebug.

Tired of useless tips?

Automated exception search integrated into your IDE

Root Cause Analysis

  1. java.io.FileNotFoundException

    /heritrix/jobs/h22-20070315210752398/scratch/tt104http.ris (Too many open files)

    at java.io.RandomAccessFile.open()
  2. Java RT
    RandomAccessFile.<init>
    1. java.io.RandomAccessFile.open(Native Method)
    2. java.io.RandomAccessFile.<init>(RandomAccessFile.java:212)
    2 frames
  3. webarchive-commons
    RecordingInputStream.getReplayInputStream
    1. org.archive.io.RandomAccessInputStream.<init>(RandomAccessInputStream.java:79)
    2. org.archive.io.ReplayInputStream.<init>(ReplayInputStream.java:97)
    3. org.archive.io.ReplayInputStream.<init>(ReplayInputStream.java:76)
    4. org.archive.io.RecordingOutputStream.getReplayInputStream(RecordingOutputStream.java:356)
    5. org.archive.io.RecordingOutputStream.getReplayInputStream(RecordingOutputStream.java:348)
    6. org.archive.io.RecordingInputStream.getReplayInputStream(RecordingInputStream.java:150)
    6 frames
  4. org.archive.crawler
    ToeThread.run
    1. org.archive.crawler.writer.ExperimentalV10WARCWriterProcessor.writeResponse(ExperimentalV10WARCWriterProcessor.java:219)
    2. org.archive.crawler.writer.ExperimentalV10WARCWriterProcessor.write(ExperimentalV10WARCWriterProcessor.java:164)
    3. org.archive.crawler.writer.ExperimentalV10WARCWriterProcessor.innerProcess(ExperimentalV10WARCWriterProcessor.java:116)
    4. org.archive.crawler.framework.Processor.process(Processor.java:109)
    5. org.archive.crawler.framework.ToeThread.processCrawlUri(ToeThread.java:302)
    6. org.archive.crawler.framework.ToeThread.run(ToeThread.java:151)
    6 frames