java.lang.NullPointerException

JIRA | Michael Stack | 10 years ago
  1. 0

    From Kris: It's me again :-) Discovered a potential NPE when terminating a job. The Frontier hangs around for the threads to finish (at least it is supposed too) but (as the following stacktrace shows), the CrawlController or more likely the CrawlScope (unsure which) does not: java.lang.NullPointerException at org.archive.crawler.postprocessor.Postselector.schedule(Postselector.java:2 69) at org.archive.crawler.postprocessor.Postselector.handleLinkCollection(Postsel ector.java:358) at org.archive.crawler.postprocessor.Postselector.innerProcess(Postselector.ja va:166) at org.archive.crawler.framework.Processor.process(Processor.java:102) at org.archive.crawler.framework.ToeThread.processCrawlUri(ToeThread.java:255) at org.archive.crawler.framework.ToeThread.run(ToeThread.java:131) Exception in thread "ToeThread #5" java.lang.NullPointerException at org.archive.crawler.framework.ToeThread.run(ToeThread.java:137) The following patch 'handles' it (if in a very simplistic way), basically just catcht the NPE, and decide that the URI is not within scope if this occurs. Maybe the CrawlController (and it should not be null) should throw an EndedException on getScope() when the crawl has been terminated? Not really a big bug, but we really should have the crawler finish at least semi-gracefully, although I should note that it did not prevent the crawl reports from being written. - Kris Index: Postselector.java =================================================================== RCS file: /cvsroot/archive-crawler/ArchiveOpenCrawler/src/java/org/archive/crawler/po stprocessor/Postselector.java,v retrieving revision 1.13 diff -u -r1.13 Postselector.java --- Postselector.java 27 Oct 2004 00:47:23 -0000 1.13 +++ Postselector.java 17 Nov 2004 15:45:25 -0000 @@ -266,21 +266,26 @@ * @return true if CandidateURI was accepted by crawl scope, false otherwise */ private boolean schedule(CandidateURI caUri) { - if(getController().getScope().accepts(caUri)) { - logger.finer("Accepted: "+caUri); - getController().getFrontier().schedule(caUri); - return true; - } else { - // Run the curi through another set of filters to see - // if we should log it to the scope rejection log. - if (logger.isLoggable(Level.INFO)) { - CrawlURI curi = (caUri instanceof CrawlURI)? - (CrawlURI)caUri: new CrawlURI(caUri.getUURI()); - if (filtersAccept(this.rejectLogFilters, curi)) { - logger.info("Rejected " + curi.getUURI().toString()); + try{ + if(getController().getScope().accepts(caUri)) { + logger.finer("Accepted: "+caUri); + getController().getFrontier().schedule(caUri); + return true; + } else { + // Run the curi through another set of filters to see + // if we should log it to the scope rejection log. + if (logger.isLoggable(Level.INFO)) { + CrawlURI curi = (caUri instanceof CrawlURI)? + (CrawlURI)caUri: new CrawlURI(caUri.getUURI()); + if (filtersAccept(this.rejectLogFilters, curi)) { + logger.info("Rejected " + curi.getUURI().toString()); + } } } + } catch(NullPointerException e){ + // Return false if this happens. Most likely the crawl is ending. } + return false; }

    JIRA | 10 years ago | Michael Stack
    java.lang.NullPointerException
  2. 0

    NullPointerException at org.archive.crawler.processor.recrawl.PersistLogProcessor.finalTasks(PersistLogProcessor.java:87) 03/09/2009 17:07:46 +0000 INFO org.archive.crawler.admin.CrawlJob postDeregister org.archive.crawler:host=crawling10.us.archive.org,jmxport=9093,mother=h1236289378518,name=1104-20090309170725217,type=CrawlService.Job unregistered from MBeanServerId=crawling10.us.archive.org_1236143748023, SpecificationVersion=1.4, ImplementationVersion=1.6.0_03-b05, SpecificationVendor=Sun Microsystems Exception in thread "ToeThread #75: " java.lang.NullPointerException at org.archive.crawler.processor.recrawl.PersistLogProcessor.finalTasks(PersistLogProcessor.java:87) at org.archive.crawler.framework.CrawlController.runProcessorFinalTasks(CrawlController.java:1676) at org.archive.crawler.framework.CrawlController.completeStop(CrawlController.java:1031) at org.archive.crawler.admin.CrawlJob$MBeanCrawlController.completeStop(CrawlJob.java:801) at org.archive.crawler.framework.CrawlController.toeEnded(CrawlController.java:1817) at org.archive.crawler.framework.ToeThread.run(ToeThread.java:186) Exception in thread "ToeThread #63: " java.lang.RuntimeException: com.sleepycat.je.DatabaseException: (JE 3.3.75) Can't call Database.sync: Database state can't be DbState.CLOSED must be DbState.OPEN at org.archive.crawler.processor.recrawl.PersistOnlineProcessor.finalTasks(PersistOnlineProcessor.java:86) at org.archive.crawler.framework.CrawlController.runProcessorFinalTasks(CrawlController.java:1676) at org.archive.crawler.framework.CrawlController.completeStop(CrawlController.java:1031) at org.archive.crawler.admin.CrawlJob$MBeanCrawlController.completeStop(CrawlJob.java:801) at org.archive.crawler.framework.CrawlController.toeEnded(CrawlController.java:1817) at org.archive.crawler.framework.ToeThread.run(ToeThread.java:186) Caused by: com.sleepycat.je.DatabaseException: (JE 3.3.75) Can't call Database.sync: Database state can't be DbState.CLOSED must be DbState.OPEN at com.sleepycat.je.Database.checkRequiredDbState(Database.java:1458) at com.sleepycat.je.Database.sync(Database.java:424) at org.archive.crawler.processor.recrawl.PersistOnlineProcessor.finalTasks(PersistOnlineProcessor.java:83) ... 5 more Exception in thread "ToeThread #61: " java.lang.RuntimeException: com.sleepycat.je.DatabaseException: (JE 3.3.75) Can't call Database.sync: Database state can't be DbState.CLOSED must be DbState.OPEN at org.archive.crawler.processor.recrawl.PersistOnlineProcessor.finalTasks(PersistOnlineProcessor.java:86) at org.archive.crawler.framework.CrawlController.runProcessorFinalTasks(CrawlController.java:1676) at org.archive.crawler.framework.CrawlController.completeStop(CrawlController.java:1031) at org.archive.crawler.admin.CrawlJob$MBeanCrawlController.completeStop(CrawlJob.java:801) at org.archive.crawler.framework.CrawlController.toeEnded(CrawlController.java:1817) at org.archive.crawler.framework.ToeThread.run(ToeThread.java:186) Caused by: com.sleepycat.je.DatabaseException: (JE 3.3.75) Can't call Database.sync: Database state can't be DbState.CLOSED must be DbState.OPEN at com.sleepycat.je.Database.checkRequiredDbState(Database.java:1458) at com.sleepycat.je.Database.sync(Database.java:424) at org.archive.crawler.processor.recrawl.PersistOnlineProcessor.finalTasks(PersistOnlineProcessor.java:83) ... 5 more Exception in thread "ToeThread #64: " java.lang.RuntimeException: com.sleepycat.je.DatabaseException: (JE 3.3.75) Can't call Database.sync: Database state can't be DbState.CLOSED must be DbState.OPEN at org.archive.crawler.processor.recrawl.PersistOnlineProcessor.finalTasks(PersistOnlineProcessor.java:86) at org.archive.crawler.framework.CrawlController.runProcessorFinalTasks(CrawlController.java:1676) at org.archive.crawler.framework.CrawlController.completeStop(CrawlController.java:1031) at org.archive.crawler.admin.CrawlJob$MBeanCrawlController.completeStop(CrawlJob.java:801) at org.archive.crawler.framework.CrawlController.toeEnded(CrawlController.java:1817) at org.archive.crawler.framework.ToeThread.run(ToeThread.java:186) Caused by: com.sleepycat.je.DatabaseException: (JE 3.3.75) Can't call Database.sync: Database state can't be DbState.CLOSED must be DbState.OPEN at com.sleepycat.je.Database.checkRequiredDbState(Database.java:1458) at com.sleepycat.je.Database.sync(Database.java:424) at org.archive.crawler.processor.recrawl.PersistOnlineProcessor.finalTasks(PersistOnlineProcessor.java:83) ... 5 more Exception in thread "ToeThread #60: " java.lang.RuntimeException: com.sleepycat.je.DatabaseException: (JE 3.3.75) Can't call Database.sync: Database state can't be DbState.CLOSED must be DbState.OPEN at org.archive.crawler.processor.recrawl.PersistOnlineProcessor.finalTasks(PersistOnlineProcessor.java:86) at org.archive.crawler.framework.CrawlController.runProcessorFinalTasks(CrawlController.java:1676) at org.archive.crawler.framework.CrawlController.completeStop(CrawlController.java:1031)

    JIRA | 8 years ago | Noah Levitt
    java.lang.NullPointerException
  3. 0

    Ftp entries in an arc file look like this currently: ftp://ftp.ksl.stanford.edu/welcome.msg 171.64.71.195 20081121190026 no-type 56 ***** ***** Stanford Knowledge Systems Laboratory ***** There is no header, only body content. When heritrix encounters an error trying to download a file, for example: 550 foo: Permission denied. it throws an exception which propagates to the logs: 11/21/2008 19:00:45 +0000 SEVERE org.archive.crawler.fetcher.FetchFTP innerProcess FTP server reported problem. org.archive.net.FTPException: FTP error code: 550 at org.archive.net.ClientFTP.openDataConnection(ClientFTP.java:130) at org.archive.crawler.fetcher.FetchFTP.fetch(FetchFTP.java:312) at org.archive.crawler.fetcher.FetchFTP.innerProcess(FetchFTP.java:252) at org.archive.crawler.framework.Processor.process(Processor.java:112) at org.archive.crawler.framework.ToeThread.processCrawlUri(ToeThread.java:302) at org.archive.crawler.framework.ToeThread.run(ToeThread.java:151) Heritrix still tries to write to the ARC, but fails because there is no content: 11/21/2008 19:00:45 +0000 SEVERE org.archive.crawler.framework.ToeThread recoverableProblem Problem java.lang.NullPointerException occured when trying to process 'ftp://ftp.ksl.stanford.edu/dev/ticotsord' at step ABOUT_TO_BEGIN_PROCESSOR in Archiver java.lang.NullPointerException at org.archive.crawler.writer.ARCWriterProcessor.innerProcess(ARCWriterProcessor.java:122) at org.archive.crawler.framework.Processor.process(Processor.java:112) at org.archive.crawler.framework.ToeThread.processCrawlUri(ToeThread.java:302) at org.archive.crawler.framework.ToeThread.run(ToeThread.java:151) So there is no record in the arc file at all. But this "550 foo: Permission denied." is essentially equivalent to a HTTP 403. It should be archived somehow and should not spew stack traces in the logs. So I propose we include a "header" section in the arc for ftp transactions. "550 foo: Permission denied." would go there. On a successful get, the message would be something like "150 Binary data connection for /welcome.msg (76.103.251.45,57342) (56 bytes)." Would this break anything?

    JIRA | 8 years ago | Noah Levitt
    java.lang.NullPointerException
  4. Speed up your debug routine!

    Automated exception search integrated into your IDE

  5. 0

    The following exception stack occurred when terminating a small test crawl via the web UI. A subsequent crawl terminated normally on same settings. com.sleepycat.util.RuntimeExceptionWrapper: (JE 3.2.23) Can't open a cursor Database state can't be DbState.CLOSED must be DbState.OPEN at com.sleepycat.collections.StoredContainer.convertException(StoredContainer.java:447) at com.sleepycat.collections.BlockIterator.hasNext(BlockIterator.java:380) at org.apache.commons.httpclient.cookie.CookieSpecBase.match(CookieSpecBase.java:607) at org.apache.commons.httpclient.HttpMethodBase.addCookieRequestHeader(HttpMethodBase.java:1193) at org.apache.commons.httpclient.HttpMethodBase.addRequestHeaders(HttpMethodBase.java:1327) at org.apache.commons.httpclient.HttpMethodBase.writeRequestHeaders(HttpMethodBase.java:2056) at org.apache.commons.httpclient.HttpMethodBase.writeRequest(HttpMethodBase.java:1939) at org.apache.commons.httpclient.HttpMethodBase.execute(HttpMethodBase.java:1000) at org.archive.httpclient.HttpRecorderGetMethod.execute(HttpRecorderGetMethod.java:116) at org.apache.commons.httpclient.HttpMethodDirector.executeWithRetry(HttpMethodDirector.java:397) at org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMethodDirector.java:170) at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:396) at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:346) at org.archive.crawler.fetcher.FetchHTTP.innerProcess(FetchHTTP.java:500) at org.archive.crawler.framework.Processor.process(Processor.java:112) at org.archive.crawler.framework.ToeThread.processCrawlUri(ToeThread.java:302) at org.archive.crawler.framework.ToeThread.run(ToeThread.java:151) Caused by: com.sleepycat.je.DatabaseException: (JE 3.2.23) Can't open a cursor Database state can't be DbState.CLOSED must be DbState.OPEN at com.sleepycat.je.Database.checkRequiredDbState(Database.java:1069) at com.sleepycat.je.Database.openCursor(Database.java:359) at com.sleepycat.collections.CurrentTransaction.openCursor(CurrentTransaction.java:364) at com.sleepycat.collections.MyRangeCursor.openCursor(MyRangeCursor.java:53) at com.sleepycat.collections.MyRangeCursor.<init>(MyRangeCursor.java:30) at com.sleepycat.collections.DataCursor.init(DataCursor.java:171) at com.sleepycat.collections.DataCursor.<init>(DataCursor.java:59) at com.sleepycat.collections.BlockIterator.hasNext(BlockIterator.java:299) ... 15 more 07/05/2007 21:02:25 +0000 SEVERE org.archive.crawler.framework.ToeThread recoverableProblem Problem com.sleepycat.util.RuntimeExceptionWrapper: (JE 3.2.23) Can't open a cursor Database state can't be DbState.CLOSED must be DbState.OPEN occured when trying to process 'http://www.landsbokasafn.is/Apps/WebObjects/HI.woa/wa/header_logo_neg.gif' at step ABOUT_TO_BEGIN_PROCESSOR in HTTP com.sleepycat.util.RuntimeExceptionWrapper: (JE 3.2.23) Can't open a cursor Database state can't be DbState.CLOSED must be DbState.OPEN at com.sleepycat.collections.StoredContainer.convertException(StoredContainer.java:447) at com.sleepycat.collections.BlockIterator.hasNext(BlockIterator.java:380) at org.apache.commons.httpclient.cookie.CookieSpecBase.match(CookieSpecBase.java:607) at org.apache.commons.httpclient.HttpMethodBase.addCookieRequestHeader(HttpMethodBase.java:1193) at org.apache.commons.httpclient.HttpMethodBase.addRequestHeaders(HttpMethodBase.java:1327) at org.apache.commons.httpclient.HttpMethodBase.writeRequestHeaders(HttpMethodBase.java:2056) at org.apache.commons.httpclient.HttpMethodBase.writeRequest(HttpMethodBase.java:1939) at org.apache.commons.httpclient.HttpMethodBase.execute(HttpMethodBase.java:1000) at org.archive.httpclient.HttpRecorderGetMethod.execute(HttpRecorderGetMethod.java:116) at org.apache.commons.httpclient.HttpMethodDirector.executeWithRetry(HttpMethodDirector.java:397) at org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMethodDirector.java:170) at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:396) at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:346) at org.archive.crawler.fetcher.FetchHTTP.innerProcess(FetchHTTP.java:500) at org.archive.crawler.framework.Processor.process(Processor.java:112) at org.archive.crawler.framework.ToeThread.processCrawlUri(ToeThread.java:302) at org.archive.crawler.framework.ToeThread.run(ToeThread.java:151) Caused by: com.sleepycat.je.DatabaseException: (JE 3.2.23) Can't open a cursor Database state can't be DbState.CLOSED must be DbState.OPEN at com.sleepycat.je.Database.checkRequiredDbState(Database.java:1069) at com.sleepycat.je.Database.openCursor(Database.java:359) at com.sleepycat.collections.CurrentTransaction.openCursor(CurrentTransaction.java:364) at com.sleepycat.collections.MyRangeCursor.openCursor(MyRangeCursor.java:53) at com.sleepycat.collections.MyRangeCursor.<init>(MyRangeCursor.java:30) at com.sleepycat.collections.DataCursor.init(DataCursor.java:171) at com.sleepycat.collections.DataCursor.<init>(DataCursor.java:59) at com.sleepycat.collections.BlockIterator.hasNext(BlockIterator.java:299) ... 15 more 07/05/2007 21:02:25 +0000 SEVERE org.archive.crawler.framework.ToeThread run Fatal exception in ToeThread #29: http://www.landsbokasafn.is/Apps/WebObjects/HI.woa/wa/header_logo_neg.gif java.lang.NullPointerException at org.archive.crawler.framework.ToeThread.run(ToeThread.java:157)

    JIRA | 9 years ago | Kristinn SigurĂ°sson
    java.lang.NullPointerException

    Not finding the right solution?
    Take a tour to get the most out of Samebug.

    Tired of useless tips?

    Automated exception search integrated into your IDE

    Root Cause Analysis

    1. java.lang.NullPointerException

      No message provided

      at org.archive.crawler.postprocessor.Postselector.schedule()
    2. org.archive.crawler
      ToeThread.run
      1. org.archive.crawler.postprocessor.Postselector.schedule(Postselector.java:269)
      2. org.archive.crawler.postprocessor.Postselector.handleLinkCollection(Postselector.java:358)
      3. org.archive.crawler.postprocessor.Postselector.innerProcess(Postselector.java:166)
      4. org.archive.crawler.framework.Processor.process(Processor.java:102)
      5. org.archive.crawler.framework.ToeThread.processCrawlUri(ToeThread.java:255)
      6. org.archive.crawler.framework.ToeThread.run(ToeThread.java:131)
      6 frames