org.archive.net.FTPException: FTP error code: 550

JIRA | Noah Levitt | 8 years ago
  1. 0

    Ftp entries in an arc file look like this currently: ftp://ftp.ksl.stanford.edu/welcome.msg 171.64.71.195 20081121190026 no-type 56 ***** ***** Stanford Knowledge Systems Laboratory ***** There is no header, only body content. When heritrix encounters an error trying to download a file, for example: 550 foo: Permission denied. it throws an exception which propagates to the logs: 11/21/2008 19:00:45 +0000 SEVERE org.archive.crawler.fetcher.FetchFTP innerProcess FTP server reported problem. org.archive.net.FTPException: FTP error code: 550 at org.archive.net.ClientFTP.openDataConnection(ClientFTP.java:130) at org.archive.crawler.fetcher.FetchFTP.fetch(FetchFTP.java:312) at org.archive.crawler.fetcher.FetchFTP.innerProcess(FetchFTP.java:252) at org.archive.crawler.framework.Processor.process(Processor.java:112) at org.archive.crawler.framework.ToeThread.processCrawlUri(ToeThread.java:302) at org.archive.crawler.framework.ToeThread.run(ToeThread.java:151) Heritrix still tries to write to the ARC, but fails because there is no content: 11/21/2008 19:00:45 +0000 SEVERE org.archive.crawler.framework.ToeThread recoverableProblem Problem java.lang.NullPointerException occured when trying to process 'ftp://ftp.ksl.stanford.edu/dev/ticotsord' at step ABOUT_TO_BEGIN_PROCESSOR in Archiver java.lang.NullPointerException at org.archive.crawler.writer.ARCWriterProcessor.innerProcess(ARCWriterProcessor.java:122) at org.archive.crawler.framework.Processor.process(Processor.java:112) at org.archive.crawler.framework.ToeThread.processCrawlUri(ToeThread.java:302) at org.archive.crawler.framework.ToeThread.run(ToeThread.java:151) So there is no record in the arc file at all. But this "550 foo: Permission denied." is essentially equivalent to a HTTP 403. It should be archived somehow and should not spew stack traces in the logs. So I propose we include a "header" section in the arc for ftp transactions. "550 foo: Permission denied." would go there. On a successful get, the message would be something like "150 Binary data connection for /welcome.msg (76.103.251.45,57342) (56 bytes)." Would this break anything?

    JIRA | 8 years ago | Noah Levitt
    org.archive.net.FTPException: FTP error code: 550
  2. 0

    Ftp entries in an arc file look like this currently: ftp://ftp.ksl.stanford.edu/welcome.msg 171.64.71.195 20081121190026 no-type 56 ***** ***** Stanford Knowledge Systems Laboratory ***** There is no header, only body content. When heritrix encounters an error trying to download a file, for example: 550 foo: Permission denied. it throws an exception which propagates to the logs: 11/21/2008 19:00:45 +0000 SEVERE org.archive.crawler.fetcher.FetchFTP innerProcess FTP server reported problem. org.archive.net.FTPException: FTP error code: 550 at org.archive.net.ClientFTP.openDataConnection(ClientFTP.java:130) at org.archive.crawler.fetcher.FetchFTP.fetch(FetchFTP.java:312) at org.archive.crawler.fetcher.FetchFTP.innerProcess(FetchFTP.java:252) at org.archive.crawler.framework.Processor.process(Processor.java:112) at org.archive.crawler.framework.ToeThread.processCrawlUri(ToeThread.java:302) at org.archive.crawler.framework.ToeThread.run(ToeThread.java:151) Heritrix still tries to write to the ARC, but fails because there is no content: 11/21/2008 19:00:45 +0000 SEVERE org.archive.crawler.framework.ToeThread recoverableProblem Problem java.lang.NullPointerException occured when trying to process 'ftp://ftp.ksl.stanford.edu/dev/ticotsord' at step ABOUT_TO_BEGIN_PROCESSOR in Archiver java.lang.NullPointerException at org.archive.crawler.writer.ARCWriterProcessor.innerProcess(ARCWriterProcessor.java:122) at org.archive.crawler.framework.Processor.process(Processor.java:112) at org.archive.crawler.framework.ToeThread.processCrawlUri(ToeThread.java:302) at org.archive.crawler.framework.ToeThread.run(ToeThread.java:151) So there is no record in the arc file at all. But this "550 foo: Permission denied." is essentially equivalent to a HTTP 403. It should be archived somehow and should not spew stack traces in the logs. So I propose we include a "header" section in the arc for ftp transactions. "550 foo: Permission denied." would go there. On a successful get, the message would be something like "150 Binary data connection for /welcome.msg (76.103.251.45,57342) (56 bytes)." Would this break anything?

    JIRA | 8 years ago | Noah Levitt
    org.archive.net.FTPException: FTP error code: 550

    Root Cause Analysis

    1. org.archive.net.FTPException

      FTP error code: 550

      at org.archive.net.ClientFTP.openDataConnection()
    2. webarchive-commons
      ClientFTP.openDataConnection
      1. org.archive.net.ClientFTP.openDataConnection(ClientFTP.java:130)
      1 frame
    3. org.archive.crawler
      ToeThread.run
      1. org.archive.crawler.fetcher.FetchFTP.fetch(FetchFTP.java:312)
      2. org.archive.crawler.fetcher.FetchFTP.innerProcess(FetchFTP.java:252)
      3. org.archive.crawler.framework.Processor.process(Processor.java:112)
      4. org.archive.crawler.framework.ToeThread.processCrawlUri(ToeThread.java:302)
      5. org.archive.crawler.framework.ToeThread.run(ToeThread.java:151)
      5 frames